In [92]:
#Loading libraries required for generation of dataframes for indel paper analysis.
library(ggplot2)
library(tidyverse)
library(tidyr)
library(ggpubr)
library(dplyr)
library(ggridges)
library(ineq)
library(RColorBrewer)
library(stringr)
library(gglorenz)
library(readr)
library(scales)
Rows: 11 Columns: 21 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (9): Genome, Accession, BRC ID, Feature ID, Annotation, Feature Type, St... dbl (5): Genome ID, Start, End, Length, AA Length lgl (7): RefSeq Locus Tag, Alt Locus Tag, FIGfam ID, PATRIC genus-specific f... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
| Genome | Genome ID | Accession | BRC ID | RefSeq Locus Tag | Alt Locus Tag | Feature ID | Annotation | Feature Type | Start | ⋯ | Length | Strand | FIGfam ID | PATRIC genus-specific families (PLfams) | PATRIC cross-genus families (PGfams) | Protein ID | AA Length | Gene Symbol | Product | GO |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <dbl> | <chr> | <chr> | <lgl> | <lgl> | <chr> | <chr> | <chr> | <dbl> | ⋯ | <dbl> | <chr> | <lgl> | <lgl> | <lgl> | <lgl> | <dbl> | <chr> | <chr> | <lgl> |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.3 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.1715.2440.fwd | PATRIC | mat_peptide | 1715 | ⋯ | 726 | + | NA | NA | NA | NA | 242 | 1C(VP3) | VP3 (1C) | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.4 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.2441.3331.fwd | PATRIC | mat_peptide | 2441 | ⋯ | 891 | + | NA | NA | NA | NA | 297 | 1D(VP1) | VP1 (1D) | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.5 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.3332.3781.fwd | PATRIC | mat_peptide | 3332 | ⋯ | 450 | + | NA | NA | NA | NA | 150 | 2A | 2A | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.6 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.3782.4078.fwd | PATRIC | mat_peptide | 3782 | ⋯ | 297 | + | NA | NA | NA | NA | 99 | 2B | 2B | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.7 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.4079.5065.fwd | PATRIC | mat_peptide | 4079 | ⋯ | 987 | + | NA | NA | NA | NA | 329 | 2C | 2C | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.8 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.5066.5323.fwd | PATRIC | mat_peptide | 5066 | ⋯ | 258 | + | NA | NA | NA | NA | 86 | 3A | 3A | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.9 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.5324.5389.fwd | PATRIC | mat_peptide | 5324 | ⋯ | 66 | + | NA | NA | NA | NA | 22 | 3B(VPg) | 3B | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.10 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.5390.5938.fwd | PATRIC | mat_peptide | 5390 | ⋯ | 549 | + | NA | NA | NA | NA | 183 | 3C | 3C | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.11 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.5939.7324.fwd | PATRIC | mat_peptide | 5939 | ⋯ | 1386 | + | NA | NA | NA | NA | 462 | 3D | 3D | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.1 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.746.952.fwd | PATRIC | mat_peptide | 746 | ⋯ | 207 | + | NA | NA | NA | NA | 69 | 1A(VP4) | VP4 (1A) | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.2 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.953.1714.fwd | PATRIC | mat_peptide | 953 | ⋯ | 762 | + | NA | NA | NA | NA | 254 | 1B(VP2) | VP2 (1B) | NA |
In [135]:
#Reading in dataframes need for figure generation and analysis
allStickles_final_input <- read_csv("allStickles_final_input_handle.csv")
alldel_final_input <- read_csv("alldel_final_input.csv")
Capsid_P2_Scores_shared <- read_csv("Capsid_P2_Scores_shared.csv")
Replication_P2_Scores_shared <- read_csv("Replication_P2_Scores_shared.csv")
Scores_Insertional_Handle_Fullproteome <- read_csv("Scores_Insertional_Handle_Fullproteome.csv")
Scores_Deletions_Fullproteome <- read_csv("Scores_Deletions_Fullproteome.csv")
Scores_Deletions1AA_Fullproteome_1AA <- read_csv("Scores_Deletions1AA_Fullproteome_1AA.csv")
Capsid_P2_Deletions_shared <- read_csv("Capsid_P2_Deletions_shared.csv")
Replication_P2_Deletion_shared <- read_csv("Replication_P2_Deletion_shared.csv")
Fullproteome_input_DMS_Enrich2_long <- read_csv("Fullproteome_input_DMS_Enrich2_long.csv")
Fullproteome_P2_DMS_Enrich2_long <- read_csv("Fullproteome_P2_DMS_Enrich2_long.csv")
Capsid_P2_DMS_Enrich2_shared <- read_csv("Capsid_P2_DMS_Enrich2_shared.csv")
Replication_P2_DMS_Enrich2_shared <- read_csv("Replication_P2_DMS_Enrich2_shared.csv")
allStickles_AA_final_input <- read_csv("allStickles_AA_final_input.csv")
Fullproteome_1AA_Enrich2_long <- read_csv("Fullproteome_1AA_Enrich2_long.csv")
Competition_Pool_Enrich2_Scores <- read_tsv("main_identifiers_scores_Competition_Pooled_Experiment.tsv", col_names = TRUE)
Scores_Competition_input <- read_tsv("Competition_Pool_Input.tsv", col_names = TRUE)
EV_Features <- read_csv("EV71_4643_Features.csv")
Rows: 2193 Columns: 4 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): dataset, Seq dbl (2): indel, n ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 6579 Columns: 4 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): dataset, Seq dbl (2): indel, n ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 862 Columns: 6 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): insertion, dataset dbl (4): RepA_score, RepB_score, RepC_score, indel ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 1331 Columns: 6 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): insertion, dataset dbl (4): RepA_score, RepB_score, RepC_score, indel ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 2193 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (2): indel, score ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 6579 Columns: 3 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): dataset dbl (2): indel, score ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 2193 Columns: 3 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): dataset dbl (2): indel, score ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 2490 Columns: 4 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (4): indel, RepA_score, RepB_score, RepC_score ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 3962 Columns: 4 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (4): indel, RepA_score, RepB_score, RepC_score ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 43860 Columns: 3 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): AminoAcid dbl (2): position, count ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 43860 Columns: 3 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): AminoAcid dbl (2): position, score ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 16378 Columns: 9 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): hgvs, Aminoacid dbl (7): RepA_DMS_SE, RepA_DMS_score, RepB_DMS_SE, RepB_DMS_score, RepC_DMS_... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 25289 Columns: 9 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): hgvs, Aminoacid dbl (7): RepA_DMS_SE, RepA_DMS_score, RepB_DMS_SE, RepB_DMS_score, RepC_DMS_... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 43860 Columns: 4 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): dataset, Seq dbl (2): indel, n ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 43860 Columns: 3 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): AminoAcid dbl (2): insertion, score ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 1933 Columns: 4 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "\t" chr (1): hgvs dbl (3): SE, epsilon, score ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 1932 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "\t" chr (1): hgvs dbl (1): count ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Rows: 11 Columns: 21 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (9): Genome, Accession, BRC ID, Feature ID, Annotation, Feature Type, St... dbl (5): Genome ID, Start, End, Length, AA Length lgl (7): RefSeq Locus Tag, Alt Locus Tag, FIGfam ID, PATRIC genus-specific f... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
| Genome | Genome ID | Accession | BRC ID | RefSeq Locus Tag | Alt Locus Tag | Feature ID | Annotation | Feature Type | Start | ⋯ | Length | Strand | FIGfam ID | PATRIC genus-specific families (PLfams) | PATRIC cross-genus families (PGfams) | Protein ID | AA Length | Gene Symbol | Product | GO |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <chr> | <dbl> | <chr> | <chr> | <lgl> | <lgl> | <chr> | <chr> | <chr> | <dbl> | ⋯ | <dbl> | <chr> | <lgl> | <lgl> | <lgl> | <lgl> | <dbl> | <chr> | <chr> | <lgl> |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.3 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.1715.2440.fwd | PATRIC | mat_peptide | 1715 | ⋯ | 726 | + | NA | NA | NA | NA | 242 | 1C(VP3) | VP3 (1C) | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.4 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.2441.3331.fwd | PATRIC | mat_peptide | 2441 | ⋯ | 891 | + | NA | NA | NA | NA | 297 | 1D(VP1) | VP1 (1D) | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.5 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.3332.3781.fwd | PATRIC | mat_peptide | 3332 | ⋯ | 450 | + | NA | NA | NA | NA | 150 | 2A | 2A | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.6 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.3782.4078.fwd | PATRIC | mat_peptide | 3782 | ⋯ | 297 | + | NA | NA | NA | NA | 99 | 2B | 2B | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.7 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.4079.5065.fwd | PATRIC | mat_peptide | 4079 | ⋯ | 987 | + | NA | NA | NA | NA | 329 | 2C | 2C | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.8 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.5066.5323.fwd | PATRIC | mat_peptide | 5066 | ⋯ | 258 | + | NA | NA | NA | NA | 86 | 3A | 3A | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.9 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.5324.5389.fwd | PATRIC | mat_peptide | 5324 | ⋯ | 66 | + | NA | NA | NA | NA | 22 | 3B(VPg) | 3B | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.10 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.5390.5938.fwd | PATRIC | mat_peptide | 5390 | ⋯ | 549 | + | NA | NA | NA | NA | 183 | 3C | 3C | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.11 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.5939.7324.fwd | PATRIC | mat_peptide | 5939 | ⋯ | 1386 | + | NA | NA | NA | NA | 462 | 3D | 3D | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.1 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.746.952.fwd | PATRIC | mat_peptide | 746 | ⋯ | 207 | + | NA | NA | NA | NA | 69 | 1A(VP4) | VP4 (1A) | NA |
| Enterovirus A71 Tainan/4643/98 | 39054.14 | MW298156 | fig|39054.14496.mat_peptide.2 | NA | NA | PATRIC.39054.14496.MW298156.mat_peptide.953.1714.fwd | PATRIC | mat_peptide | 953 | ⋯ | 762 | + | NA | NA | NA | NA | 254 | 1B(VP2) | VP2 (1B) | NA |
In [94]:
#Figure 1. Deep Insertion, Deletion, and Mutational Scanning of the EV-A71 proteome
#Bar plot of insertional handle input library
G_input_handle<-ggplot(allStickles_final_input)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n,fill=Seq),binwidth = 5,fun = function(X) {log10(sum(X))} )+scale_fill_manual(values = "#8BC3E4")+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(77,157,235,314,392,470,548,626,706,784,862,940,1018,1096,1174,1252,1330,1409,1488,1566,1645,1724,1802,1881,1958,2037,2115,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)
G_input_handle
#Save figure
ggsave(plot = G_input_handle,filename = "meanPlot_inserts_input_handle.pdf",width =5,height=2)
#Ridgeline plot of insertional handle input library
Sublib <- c(0,77,157,235,314,392,470,548,626,706,784,862,940,1018,1096,1174,1252,1330,1409,1488,1566,1645,1724,1802,1881,1958,2037,2115,2193)
Sublib_label <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28")
allStickles_final_input$cat <- cut(allStickles_final_input$indel, Sublib, Sublib_label)
allStickles_final_input_filter<- allStickles_final_input %>%group_by(cat)%>% mutate(Median=median(n+1))
Ridgeplot_input_lib <- ggplot(allStickles_final_input_filter, aes(x = n+1, y = cat)) + geom_density_ridges2(aes(fill=Median))+scale_y_discrete(expand = c(0, 0), breaks=c(1, 10, 20, 30)) + scale_x_log10(lim = c(50, 1000))+ylab("Sublibraries")+ xlab("Count")
Ridgeplot_input_lib
#Save figure
ggsave(plot = Ridgeplot_input_lib,filename = "Ridge_plot.pdf",device = pdf,width =4,height=4)
#Lorenz curve and Gini Coefficient of insertional handle input library
# Calculate Lorenz curve points for all of the dataset
lorenz_data_master <- ineq::Lc(allStickles_final_input$n)
# Calculate Gini coefficient
gini_coef_master <- ineq::Gini(allStickles_final_input$n)
cat("Gini Coefficient_master:", gini_coef_master, "\n")
# Create a dataframe for Lorenz curve plotting
lorenz_df_master <- data.frame(x = lorenz_data_master$p, y = lorenz_data_master$L)
# Calculate Gini coefficient for each category
gini_results <- tapply(allStickles_final_input_filter$n, allStickles_final_input_filter$cat, Gini)
# Convert results to a data frame
gini_df <- data.frame(cat = names(gini_results), gini = gini_results)
gini_df
min_value <- min(gini_df$gini)
max_value <- max(gini_df$gini)
cat("Minimum value:", min_value, "\n")
cat("Maximum value:", max_value, "\n")
# Create a function to calculate Lorenz curve
calculate_lorenz <- function(n) {
lorenz_data <- Lc(n)
lorenz_df <- data.frame(x = lorenz_data$p, y = lorenz_data$L)
return(lorenz_df)
}
# Calculate Lorenz curves for each category
lorenz_results <- by(allStickles_final_input_filter$n, allStickles_final_input_filter$cat, calculate_lorenz)
# Convert the results to a list of data frames
lorenz_list <- lapply(lorenz_results, as.data.frame)
# Combine the list of data frames into a single data frame
lorenz_df <- do.call(rbind, lorenz_list)
# Add a column for category
lorenz_df$category <- rep(names(lorenz_results), sapply(lorenz_results, nrow))
#Bind master_df with sub-libraries
loren<-bind_rows(lorenz_df_master, lorenz_df)
#Create a graph to plot Lorenz curves
Lorenz_Curve_sublib <- ggplot(loren, aes(x = x, y = y, col=factor(category, levels=c(1:28) ))) +
geom_line() +geom_abline(intercept = 0, slope = 1, linetype = "dashed") +labs(title = "Gini Coefficient=0.22(0.08-0.34)", x = "Cumulative Proportion of Variants",y = "Cumulative Proportion of Reads")+theme_classic()
#Create a new graph to plot Lorenz curve sub-library and the master df with a dotted line
Lorenz_Curve_Final <- Lorenz_Curve_sublib+geom_line(data = loren[is.na(loren$category), ], color = "black", size = 1,linetype="dotted")+theme(legend.position = "none")
Lorenz_Curve_Final
#Save figure
ggsave(plot = Lorenz_Curve_Final,filename = "Lorenz_Curve_Final.pdf",device = pdf,width =4,height=4)
#Statistics included in the Figure 2 description
# Count positions with count more than 1
positions_above_1_insertional_handle <- sum(allStickles_final_input$n > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_insertional_handle/2193, "\n")
#Bar plot of deletion 3 bp input library
alldel_final_input_3bp <- filter(alldel_final_input,Seq=="3bp")
G_input_del<-ggplot(alldel_final_input_3bp)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n,fill=Seq),binwidth = 5,fun = function(X) {log10(sum(X))} )+scale_fill_manual(values = "#EB8F93")+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)
G_input_del
ggsave(plot = G_input_del,filename = "meanPlot_input_3bp.pdf",width = 5,height=2)
#Ridgeline plot of deletion 3 bp input library
Sublib <- c(0,47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193)
Sublib_label <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45")
alldel_final_input_3bp$cat <- cut(alldel_final_input_3bp$indel, Sublib, Sublib_label)
alldel_final_input_filter<-alldel_final_input_3bp %>%group_by(cat)%>%mutate(Median=median(n+1))
Ridgeplot_input_lib_del <- ggplot(alldel_final_input_filter, aes(x = n+1, y = cat)) + geom_density_ridges2(aes(fill=Median))+scale_y_discrete(expand = c(0, 0), breaks=c(1, 10, 20, 30, 40, 50)) + scale_x_log10(lim = c(50, 3000))+ylab("Sublibraries")+ xlab("Count")
Ridgeplot_input_lib_del
ggsave(plot = Ridgeplot_input_lib_del,filename = "Ridge_plot_del.pdf",device = pdf,width = 4,height=4)
#Lorenz curve and Gini Coefficient of deletion 3 bp input library
# Calculate Lorenz curve points for all of the dataset
lorenz_data_master <- ineq::Lc(alldel_final_input_3bp$n)
# Calculate Gini coefficient
gini_coef_master <- ineq::Gini(alldel_final_input_3bp$n)
cat("Gini Coefficient_master:", gini_coef_master, "\n")
# Create a dataframe for Lorenz curve plotting
lorenz_df_master <- data.frame(x = lorenz_data_master$p, y = lorenz_data_master$L)
# Calculate Gini coefficient for each category
gini_results <- tapply(alldel_final_input_3bp$n, alldel_final_input_3bp$cat, Gini)
# Convert results to a data frame
gini_df <- data.frame(cat = names(gini_results), gini = gini_results)
gini_df
min_value <- min(gini_df$gini)
max_value <- max(gini_df$gini)
cat("Minimum value:", min_value, "\n")
cat("Maximum value:", max_value, "\n")
# Create a function to calculate Lorenz curve
calculate_lorenz <- function(n) {
lorenz_data <- Lc(n)
lorenz_df <- data.frame(x = lorenz_data$p, y = lorenz_data$L)
return(lorenz_df)
}
# Calculate Lorenz curves for each category
lorenz_results <- by(alldel_final_input_3bp$n, alldel_final_input_3bp$cat, calculate_lorenz)
# Convert the results to a list of data frames
lorenz_list <- lapply(lorenz_results, as.data.frame)
# Combine the list of data frames into a single data frame
lorenz_df <- do.call(rbind, lorenz_list)
# Add a column for category
lorenz_df$category <- rep(names(lorenz_results), sapply(lorenz_results, nrow))
#Bind master_df with sub-libraries
loren<-bind_rows(lorenz_df_master, lorenz_df)
#Create a graph to plot Lorenz curves NA=grey=Masterdf)
Lorenz_Curve_sublib <- ggplot(loren, aes(x = x, y = y, col=factor(category, levels=c(1:45) ))) +
geom_line() +geom_abline(intercept = 0, slope = 1, linetype = "dashed") +labs(title = "Gini Coefficient=0.36(0.06-0.46)", x = "Cumulative Proportion of Variants",y = "Cumulative Proportion of Reads")+theme_classic()
#Create a new graph to plot Lorenz curve sub-library and the master df with a dotted line
Lorenz_Curve_Final_del <- Lorenz_Curve_sublib+geom_line(data = loren[is.na(loren$category), ], color = "black", size = 1,linetype="dotted") +theme(legend.position = "none")
Lorenz_Curve_Final_del
#Save Figure
ggsave(plot = Lorenz_Curve_Final_del,filename = "Lorenz_Curve_Final_del.pdf",device = pdf,width =4,height=4)
#Statistics included in the Figure 2 description
#Count positions with count more than 1
positions_above_1_deletion_3bp <- sum(alldel_final_input_3bp$n > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_deletion_3bp/2193, "\n")
#Bar plot of DMS input library
DMS_plot_input <- ggplot(Fullproteome_input_DMS_Enrich2_long)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=7),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(position,count,fill="#3c7430"),binwidth = 5,fun = function(X) {log10(sum(X))} )+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(51,102,153,204,255,306,357,408,459,510,561,612,662,712,762,812,862,916,970,1024,1078,1132,1186,1239,1292,1345,1398,1451,1504,1557,1610,1663,1716,1769,1822,1875,1928,1981,2034,2087,2140,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)+scale_fill_manual(values = "#3c7430")
ggsave(plot = DMS_plot_input,filename = "DMS_plot_input.pdf",width = 5,height=2)
DMS_plot_input
#DMS boxplot
Boxplot_DMS_Aminoacidcounts <- ggplot(Fullproteome_input_DMS_Enrich2_long, aes(y = count, x = factor(AminoAcid, levels = unique(AminoAcid)))) +ylim(0,400)+geom_boxplot(outlier.shape = NA)+theme_classic()+labs(y="Median Variant Count",x="")
Boxplot_DMS_Aminoacidcounts
#Save figure
ggsave(plot = Boxplot_DMS_Aminoacidcounts,filename = "Boxplot_DMS_Aminoacidcounts.pdf",width = 5,height=2)
#Statistics included in the Figure 2 description
#Count positions with count more than 1
Fullproteome_input_DMS_Enrich2_long_NAreplaced0 <- Fullproteome_input_DMS_Enrich2_long %>% replace(is.na(.), 0)
positions_above_1_DMS <- sum(Fullproteome_input_DMS_Enrich2_long_NAreplaced0$count > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_DMS/41667, "\n")
Picking joint bandwidth of 0.0384 Warning message: “Removed 8 rows containing non-finite values (`stat_density_ridges()`).”
Picking joint bandwidth of 0.0384 Warning message: “Removed 8 rows containing non-finite values (`stat_density_ridges()`).”
Gini Coefficient_master: 0.217829
| cat | gini | |
|---|---|---|
| <chr> | <dbl> | |
| 1 | 1 | 0.17690972 |
| 2 | 2 | 0.20266454 |
| 3 | 3 | 0.12101082 |
| 4 | 4 | 0.13268539 |
| 5 | 5 | 0.16395614 |
| 6 | 6 | 0.11511735 |
| 7 | 7 | 0.13078925 |
| 8 | 8 | 0.19452011 |
| 9 | 9 | 0.15415858 |
| 10 | 10 | 0.33690447 |
| 11 | 11 | 0.15459547 |
| 12 | 12 | 0.33111207 |
| 13 | 13 | 0.13254769 |
| 14 | 14 | 0.14797611 |
| 15 | 15 | 0.09756186 |
| 16 | 16 | 0.13106871 |
| 17 | 17 | 0.12301158 |
| 18 | 18 | 0.11436645 |
| 19 | 19 | 0.12500021 |
| 20 | 20 | 0.11941092 |
| 21 | 21 | 0.11839446 |
| 22 | 22 | 0.11920143 |
| 23 | 23 | 0.12470880 |
| 24 | 24 | 0.10588332 |
| 25 | 25 | 0.08427244 |
| 26 | 26 | 0.11721940 |
| 27 | 27 | 0.12847342 |
| 28 | 28 | 0.09546884 |
Minimum value: 0.08427244 Maximum value: 0.3369045
Number of positions with count more than 1: 1
Picking joint bandwidth of 0.034 Warning message: “Removed 85 rows containing non-finite values (`stat_density_ridges()`).”
Picking joint bandwidth of 0.034 Warning message: “Removed 85 rows containing non-finite values (`stat_density_ridges()`).”
Gini Coefficient_master: 0.3600231
| cat | gini | |
|---|---|---|
| <chr> | <dbl> | |
| 1 | 1 | 0.13955728 |
| 2 | 2 | 0.08073368 |
| 3 | 3 | 0.09550911 |
| 4 | 4 | 0.06303381 |
| 5 | 5 | 0.08504353 |
| 6 | 6 | 0.08042455 |
| 7 | 7 | 0.07828678 |
| 8 | 8 | 0.38021542 |
| 9 | 9 | 0.38672246 |
| 10 | 10 | 0.09471075 |
| 11 | 11 | 0.07023564 |
| 12 | 12 | 0.11295881 |
| 13 | 13 | 0.06462484 |
| 14 | 14 | 0.11095486 |
| 15 | 15 | 0.06371616 |
| 16 | 16 | 0.05777935 |
| 17 | 17 | 0.06188949 |
| 18 | 18 | 0.12894821 |
| 19 | 19 | 0.14632343 |
| 20 | 20 | 0.46131001 |
| 21 | 21 | 0.10939927 |
| 22 | 22 | 0.16916241 |
| 23 | 23 | 0.10405539 |
| 24 | 24 | 0.11942941 |
| 25 | 25 | 0.14864221 |
| 26 | 26 | 0.11736034 |
| 27 | 27 | 0.08681335 |
| 28 | 28 | 0.11075934 |
| 29 | 29 | 0.09081287 |
| 30 | 30 | 0.12239759 |
| 31 | 31 | 0.09972610 |
| 32 | 32 | 0.17898990 |
| 33 | 33 | 0.10096861 |
| 34 | 34 | 0.08777201 |
| 35 | 35 | 0.12806367 |
| 36 | 36 | 0.11599762 |
| 37 | 37 | 0.11235481 |
| 38 | 38 | 0.08494888 |
| 39 | 39 | 0.15630112 |
| 40 | 40 | 0.08997237 |
| 41 | 41 | 0.07475519 |
| 42 | 42 | 0.12888948 |
| 43 | 43 | 0.08559853 |
| 44 | 44 | 0.10863150 |
| 45 | 45 | 0.18619782 |
Minimum value: 0.05777935 Maximum value: 0.46131
Number of positions with count more than 1: 0.998632
Warning message: “Removed 2193 rows containing non-finite values (`stat_summary_bin()`).” Warning message: “Removed 2193 rows containing non-finite values (`stat_summary_bin()`).”
Warning message: “Removed 7412 rows containing non-finite values (`stat_boxplot()`).”
Warning message: “Removed 7412 rows containing non-finite values (`stat_boxplot()`).”
Number of positions with count more than 1: 0.999952
In [95]:
#Supplementary Figure 2. Deep Deletion Scanning of the EV-A71 proteome
#Bar plot of deletion 6 bp input library
alldel_final_input_6bp <- filter(alldel_final_input,Seq=="6bp")
G_input_del_6bp<-ggplot(alldel_final_input_6bp)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n,fill=Seq),binwidth = 5,fun = function(X) {log10(sum(X))} )+scale_fill_manual(values = "#EB8F93")+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)
G_input_del_6bp
#Save figure
ggsave(plot = G_input_del_6bp,filename = "meanPlot_input_6bp.pdf",width = 5,height=2)
#Ridgeline plot of deletion 6 bp input library
Sublib <- c(0,47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193)
Sublib_label <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45")
alldel_final_input_6bp$cat <- cut(alldel_final_input_6bp$indel, Sublib, Sublib_label)
alldel_final_input_6bp_filter<-alldel_final_input_6bp %>%group_by(cat)%>%mutate(Median=median(n+1))
Ridgeplot_input_6bp_lib_del <- ggplot(alldel_final_input_6bp_filter, aes(x = n+1, y = cat)) + geom_density_ridges2(aes(fill=Median))+scale_y_discrete(expand = c(0, 0), breaks=c(1, 10, 20, 30, 40, 50)) + scale_x_log10(lim = c(50, 3000))+ylab("Sublibraries")+ xlab("Count")
Ridgeplot_input_6bp_lib_del
#Save figure
ggsave(plot = Ridgeplot_input_6bp_lib_del,filename = "Ridge_plot_del_6bp.pdf",device = pdf,width = 4,height=4)
#Lorenz curve and Gini Coefficient of deletion 6 bp input library
# Calculate Lorenz curve points for all of the dataset
lorenz_data_master <- ineq::Lc(alldel_final_input_6bp_filter$n)
# Calculate Gini coefficient
gini_coef_master <- ineq::Gini(alldel_final_input_6bp_filter$n)
cat("Gini Coefficient_master:", gini_coef_master, "\n")
# Create a dataframe for Lorenz curve plotting
lorenz_df_master <- data.frame(x = lorenz_data_master$p, y = lorenz_data_master$L)
# Calculate Gini coefficient for each category
gini_results <- tapply(alldel_final_input_6bp_filter$n, alldel_final_input_6bp_filter$cat, Gini)
# Convert results to a data frame
gini_df <- data.frame(cat = names(gini_results), gini = gini_results)
gini_df
min_value <- min(gini_df$gini)
max_value <- max(gini_df$gini)
cat("Minimum value:", min_value, "\n")
cat("Maximum value:", max_value, "\n")
# Create a function to calculate Lorenz curve
calculate_lorenz <- function(n) {
lorenz_data <- Lc(n)
lorenz_df <- data.frame(x = lorenz_data$p, y = lorenz_data$L)
return(lorenz_df)
}
# Calculate Lorenz curves for each category
lorenz_results <- by(alldel_final_input_6bp_filter$n, alldel_final_input_6bp_filter$cat, calculate_lorenz)
# Convert the results to a list of data frames
lorenz_list <- lapply(lorenz_results, as.data.frame)
# Combine the list of data frames into a single data frame
lorenz_df <- do.call(rbind, lorenz_list)
# Add a column for category
lorenz_df$category <- rep(names(lorenz_results), sapply(lorenz_results, nrow))
#Bind master_df with sub-libraries
loren<-bind_rows(lorenz_df_master, lorenz_df)
#Create a graph to plot Lorenz curves
Lorenz_Curve_sublib_6bp <- ggplot(loren, aes(x = x, y = y, col=factor(category, levels=c(1:45) ))) +
geom_line() +geom_abline(intercept = 0, slope = 1, linetype = "dashed") +labs(title = "Gini Coefficient=0.34(0.06-0.47)", x = "Cumulative Proportion of Variants",y = "Cumulative Proportion of Reads")+theme_classic()
#Create a new graph to plot Lorenz curve sub-library and the master df with a dotted line
Lorenz_Curve_Final_del_6bp <- Lorenz_Curve_sublib_6bp+geom_line(data = loren[is.na(loren$category), ], color = "black", size = 1,linetype="dotted")+theme(legend.position = "none")
Lorenz_Curve_Final_del_6bp
#Save figure
ggsave(plot = Lorenz_Curve_Final_del_6bp,filename = "Lorenz_Curve_Final_del_6bp.pdf",device = pdf,width = 4,height=4)
#Statistics included in the Figure 2 description
#Count positions with count more than 1
positions_above_1_deletion_6bp <- sum(alldel_final_input_6bp$n > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_deletion_6bp/2193, "\n")
#Bar plot of deletion 9 bp input library
alldel_final_input_9bp <- filter(alldel_final_input,Seq=="9bp")
G_input_del_9bp<-ggplot(alldel_final_input_9bp)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n,fill=Seq),binwidth = 5,fun = function(X) {log10(sum(X))} )+scale_fill_manual(values = "#EB8F93")+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)
G_input_del_9bp
#Save figure
ggsave(plot = G_input_del_9bp,filename = "meanPlot_input_9bp.pdf",width = 5,height=2)
#Ridgeline plot of deletion 9 bp input library
Sublib <- c(0,47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193)
Sublib_label <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45")
alldel_final_input_9bp$cat <- cut(alldel_final_input_9bp$indel, Sublib, Sublib_label)
alldel_final_input_9bp_filter<-alldel_final_input_9bp %>%group_by(cat)%>%mutate(Median=median(n+1))
Ridgeplot_input_9bp_lib_del <- ggplot(alldel_final_input_9bp_filter, aes(x = n+1, y = cat)) + geom_density_ridges2(aes(fill=Median))+scale_y_discrete(expand = c(0, 0), breaks=c(1, 10, 20, 30, 40, 50)) + scale_x_log10(lim = c(50, 3000))+ylab("Sublibraries")+ xlab("Count")
Ridgeplot_input_9bp_lib_del
ggsave(plot = Ridgeplot_input_9bp_lib_del,filename = "Ridge_plot_del_9bp.pdf",device = pdf,width = 4,height=4)
#Lorenz curve and Gini Coefficient of deletion 9 bp input library
# Calculate Lorenz curve points for all of the dataset
lorenz_data_master <- ineq::Lc(alldel_final_input_9bp_filter$n)
# Calculate Gini coefficient
gini_coef_master <- ineq::Gini(alldel_final_input_9bp_filter$n)
cat("Gini Coefficient_master:", gini_coef_master, "\n")
# Create a dataframe for Lorenz curve plotting
lorenz_df_master <- data.frame(x = lorenz_data_master$p, y = lorenz_data_master$L)
# Calculate Gini coefficient for each category
gini_results <- tapply(alldel_final_input_9bp_filter$n, alldel_final_input_9bp_filter$cat, Gini)
# Convert results to a data frame
gini_df <- data.frame(cat = names(gini_results), gini = gini_results)
gini_df
min_value <- min(gini_df$gini)
max_value <- max(gini_df$gini)
cat("Minimum value:", min_value, "\n")
cat("Maximum value:", max_value, "\n")
# Create a function to calculate Lorenz curve
calculate_lorenz <- function(n) {
lorenz_data <- Lc(n)
lorenz_df <- data.frame(x = lorenz_data$p, y = lorenz_data$L)
return(lorenz_df)
}
# Calculate Lorenz curves for each category
lorenz_results <- by(alldel_final_input_9bp_filter$n, alldel_final_input_9bp_filter$cat, calculate_lorenz)
# Convert the results to a list of data frames
lorenz_list <- lapply(lorenz_results, as.data.frame)
# Combine the list of data frames into a single data frame
lorenz_df <- do.call(rbind, lorenz_list)
# Add a column for category
lorenz_df$category <- rep(names(lorenz_results), sapply(lorenz_results, nrow))
#Bind master_df with sub-libraries
loren<-bind_rows(lorenz_df_master, lorenz_df)
#Create a graph to plot Lorenz curves NA=grey=Masterdf)
Lorenz_Curve_sublib_9bp <- ggplot(loren, aes(x = x, y = y, col=factor(category, levels=c(1:45) ))) +
geom_line() +geom_abline(intercept = 0, slope = 1, linetype = "dashed") +labs(title = "Gini Coefficient=0.34(0.06-0.48)", x = "Cumulative Proportion of Variants",y = "Cumulative Proportion of Reads")+theme_classic()
#Create a new graph to plot Lorenz curve sub-library and the master df with a dotted line
Lorenz_Curve_Final_del_9bp <- Lorenz_Curve_sublib_9bp+geom_line(data = loren[is.na(loren$category), ], color = "black", size = 1,linetype="dotted")+theme(legend.position = "none")
Lorenz_Curve_Final_del_9bp
#Save figure
ggsave(plot = Lorenz_Curve_Final_del_9bp,filename = "Lorenz_Curve_Final_del_9bp.pdf",device = pdf,width = 4,height=4)
#Statistics included in the Figure 2 description
#Count positions with count more than 1
positions_above_1_deletion_9bp <- sum(alldel_final_input_9bp$n > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_deletion_9bp/2193, "\n")
#Boxplot of different deletion sizes
Boxplot_Delsize_input <- ggplot(alldel_final_input, aes(y=n,x=Seq))+geom_boxplot(outlier.shape = NA)+ylim(0,500)+ylab("Median Variant Count")+xlab("")+theme_classic()
Boxplot_Delsize_input
#Save figure
ggsave(plot = Boxplot_Delsize_input,filename = "Boxplot_Delsize_input.pdf",device = pdf,width = 1,height=1)
#Scatter plots for different sizes of deletions
#Tidying dataframe for 3bp, creating a column 3bp with input counts
alldel_final_input_filtered_3bp <- filter(alldel_final_input,Seq=="3bp")
alldel_final_input_filtered_3bp <- pivot_wider(alldel_final_input_filtered_3bp,names_from = Seq, values_from =n)
alldel_final_input_filtered_3bp <- select(alldel_final_input_filtered_3bp,indel,"3bp")
#Tidying dataframe for 6bp, creating a column 6bp with input counts
alldel_final_input_filtered_6bp <- filter(alldel_final_input,Seq=="6bp")
alldel_final_input_filtered_6bp <- pivot_wider(alldel_final_input_filtered_6bp,names_from = Seq, values_from =n)
alldel_final_input_filtered_6bp <- select(alldel_final_input_filtered_6bp,indel,"6bp")
#Tidying dataframe for 9bp, creating a column 9bp with input counts
alldel_final_input_filtered_9bp <- filter(alldel_final_input,Seq=="9bp")
alldel_final_input_filtered_9bp <- pivot_wider(alldel_final_input_filtered_9bp,names_from = Seq, values_from =n)
alldel_final_input_filtered_9bp <- select(alldel_final_input_filtered_9bp,indel,"9bp")
#Merging all three dataframes
alldel_final_input_filtered_3_6_bp <- merge(alldel_final_input_filtered_3bp,alldel_final_input_filtered_6bp,by="indel",all.x=TRUE,all.y=TRUE)
alldel_final_input_filtered_3_9_bp <- merge(alldel_final_input_filtered_3bp,alldel_final_input_filtered_9bp,by="indel",all.x=TRUE,all.y=TRUE)
alldel_final_input_filtered_6_9_bp <- merge(alldel_final_input_filtered_6bp,alldel_final_input_filtered_9bp,by="indel",all.x=TRUE,all.y=TRUE)
#Plotting 1AA against 2AA deletions and then using a linear model to obtain correlation, R-squared
Scatter_input_1AA_2AA <- ggplot(alldel_final_input_filtered_3_6_bp, aes(x = `3bp`, y = `6bp`)) + geom_point(size = 0.05,color="#EB8F93") +scale_x_log10()+scale_y_log10()+
labs(x = "1 AA", y = "2 AA") +theme_classic()
Scatter_input_1AA_2AA
ggsave(plot = Scatter_input_1AA_2AA,filename = "Scatter_input_1AA_2AA.pdf",device = pdf,width = 2,height=2)
model_1AA_2AA <- lm(`3bp` ~ `6bp`, data = alldel_final_input_filtered_3_6_bp)
summary_model_1AA_2AA <- summary(model_1AA_2AA)
summary_model_1AA_2AA
#Plotting 1AA against 3AA deletions and then using a linear model to obtain correlation, R-squared
Scatter_input_1AA_3AA <- ggplot(alldel_final_input_filtered_3_9_bp, aes(x = `3bp`, y = `9bp`)) + geom_point(size = 0.05,color="#EB8F93") +scale_x_log10()+scale_y_log10()+
labs(x = "1 AA", y = "3 AA") +theme_classic()
Scatter_input_1AA_3AA
ggsave(plot = Scatter_input_1AA_3AA,filename = "Scatter_input_1AA_3AA.pdf",device = pdf,width = 2,height=2)
model_1AA_3AA <- lm(`3bp` ~ `9bp`, data = alldel_final_input_filtered_3_9_bp)
summary_model_1AA_3AA <- summary(model_1AA_3AA)
summary_model_1AA_3AA
#Plotting 1AA against 3AA deletions and then using a linear model to obtain correlation, R-squared
Scatter_input_2AA_3AA <- ggplot(alldel_final_input_filtered_6_9_bp, aes(x = `6bp`, y = `9bp`)) + geom_point(size = 0.05,color="#EB8F93") +scale_x_log10()+scale_y_log10()+
labs(x = "2 AA", y = "3 AA") +theme_classic()
Scatter_input_2AA_3AA
ggsave(plot = Scatter_input_2AA_3AA,filename = "Scatter_input_2AA_3AA.pdf",device = pdf,width = 2,height=2)
model_2AA_3AA <- lm(`6bp` ~ `9bp`, data = alldel_final_input_filtered_6_9_bp)
summary_model_2AA_3AA <- summary(model_2AA_3AA)
summary_model_2AA_3AA
Picking joint bandwidth of 0.0369 Warning message: “Removed 71 rows containing non-finite values (`stat_density_ridges()`).”
Picking joint bandwidth of 0.0369 Warning message: “Removed 71 rows containing non-finite values (`stat_density_ridges()`).”
Gini Coefficient_master: 0.3375039
| cat | gini | |
|---|---|---|
| <chr> | <dbl> | |
| 1 | 1 | 0.20580339 |
| 2 | 2 | 0.10908092 |
| 3 | 3 | 0.11069086 |
| 4 | 4 | 0.06527017 |
| 5 | 5 | 0.11075620 |
| 6 | 6 | 0.08796619 |
| 7 | 7 | 0.10409583 |
| 8 | 8 | 0.46991261 |
| 9 | 9 | 0.32344657 |
| 10 | 10 | 0.10035349 |
| 11 | 11 | 0.08456274 |
| 12 | 12 | 0.07399836 |
| 13 | 13 | 0.08440298 |
| 14 | 14 | 0.11709532 |
| 15 | 15 | 0.09539333 |
| 16 | 16 | 0.07071512 |
| 17 | 17 | 0.10396686 |
| 18 | 18 | 0.10768011 |
| 19 | 19 | 0.15565094 |
| 20 | 20 | 0.43460996 |
| 21 | 21 | 0.13208113 |
| 22 | 22 | 0.15789828 |
| 23 | 23 | 0.20570313 |
| 24 | 24 | 0.11960660 |
| 25 | 25 | 0.16025306 |
| 26 | 26 | 0.12935435 |
| 27 | 27 | 0.11695801 |
| 28 | 28 | 0.11278076 |
| 29 | 29 | 0.09707268 |
| 30 | 30 | 0.09029672 |
| 31 | 31 | 0.10500336 |
| 32 | 32 | 0.20945212 |
| 33 | 33 | 0.08543672 |
| 34 | 34 | 0.08426839 |
| 35 | 35 | 0.12025691 |
| 36 | 36 | 0.09713162 |
| 37 | 37 | 0.08402851 |
| 38 | 38 | 0.12017671 |
| 39 | 39 | 0.09802442 |
| 40 | 40 | 0.10237690 |
| 41 | 41 | 0.08797313 |
| 42 | 42 | 0.09456121 |
| 43 | 43 | 0.07284639 |
| 44 | 44 | 0.08917211 |
| 45 | 45 | 0.19898368 |
Minimum value: 0.06527017 Maximum value: 0.4699126
Number of positions with count more than 1: 0.994072
Picking joint bandwidth of 0.0378 Warning message: “Removed 71 rows containing non-finite values (`stat_density_ridges()`).”
Picking joint bandwidth of 0.0378 Warning message: “Removed 71 rows containing non-finite values (`stat_density_ridges()`).”
Gini Coefficient_master: 0.3376191
| cat | gini | |
|---|---|---|
| <chr> | <dbl> | |
| 1 | 1 | 0.15179579 |
| 2 | 2 | 0.11317787 |
| 3 | 3 | 0.12487545 |
| 4 | 4 | 0.07973055 |
| 5 | 5 | 0.10663456 |
| 6 | 6 | 0.08536002 |
| 7 | 7 | 0.08228591 |
| 8 | 8 | 0.47617816 |
| 9 | 9 | 0.36233575 |
| 10 | 10 | 0.08769753 |
| 11 | 11 | 0.07148973 |
| 12 | 12 | 0.08432235 |
| 13 | 13 | 0.08868911 |
| 14 | 14 | 0.10014758 |
| 15 | 15 | 0.06397132 |
| 16 | 16 | 0.07600504 |
| 17 | 17 | 0.06726681 |
| 18 | 18 | 0.15008007 |
| 19 | 19 | 0.20519693 |
| 20 | 20 | 0.47406300 |
| 21 | 21 | 0.12644038 |
| 22 | 22 | 0.19155743 |
| 23 | 23 | 0.15228502 |
| 24 | 24 | 0.14261428 |
| 25 | 25 | 0.17640591 |
| 26 | 26 | 0.13864841 |
| 27 | 27 | 0.09946769 |
| 28 | 28 | 0.09599150 |
| 29 | 29 | 0.09180535 |
| 30 | 30 | 0.09926261 |
| 31 | 31 | 0.12243629 |
| 32 | 32 | 0.24514666 |
| 33 | 33 | 0.08974250 |
| 34 | 34 | 0.11370892 |
| 35 | 35 | 0.13667322 |
| 36 | 36 | 0.08679486 |
| 37 | 37 | 0.08880805 |
| 38 | 38 | 0.12168116 |
| 39 | 39 | 0.12165087 |
| 40 | 40 | 0.09642445 |
| 41 | 41 | 0.11122425 |
| 42 | 42 | 0.10559112 |
| 43 | 43 | 0.10377614 |
| 44 | 44 | 0.08117261 |
| 45 | 45 | 0.18701958 |
Minimum value: 0.06397132 Maximum value: 0.4761782
Number of positions with count more than 1: 0.99544
Warning message: “Removed 333 rows containing non-finite values (`stat_boxplot()`).”
Warning message: “Removed 333 rows containing non-finite values (`stat_boxplot()`).” Warning message: “Transformation introduced infinite values in continuous x-axis” Warning message: “Transformation introduced infinite values in continuous y-axis”
Warning message: “Transformation introduced infinite values in continuous x-axis” Warning message: “Transformation introduced infinite values in continuous y-axis”
Call:
lm(formula = `3bp` ~ `6bp`, data = alldel_final_input_filtered_3_6_bp)
Residuals:
Min 1Q Median 3Q Max
-591.72 -47.53 -3.33 40.49 2410.24
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.109358 3.085098 -2.304 0.0213 *
`6bp` 0.955911 0.007645 125.032 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 101.8 on 2191 degrees of freedom
Multiple R-squared: 0.8771, Adjusted R-squared: 0.877
F-statistic: 1.563e+04 on 1 and 2191 DF, p-value: < 2.2e-16
Warning message: “Transformation introduced infinite values in continuous x-axis” Warning message: “Transformation introduced infinite values in continuous y-axis”
Warning message: “Transformation introduced infinite values in continuous x-axis” Warning message: “Transformation introduced infinite values in continuous y-axis”
Call:
lm(formula = `3bp` ~ `9bp`, data = alldel_final_input_filtered_3_9_bp)
Residuals:
Min 1Q Median 3Q Max
-451.67 -48.93 -1.15 47.98 658.24
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.61006 2.91255 -5.36 9.22e-08 ***
`9bp` 1.02817 0.00761 135.11 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 95.03 on 2191 degrees of freedom
Multiple R-squared: 0.8928, Adjusted R-squared: 0.8928
F-statistic: 1.825e+04 on 1 and 2191 DF, p-value: < 2.2e-16
Warning message: “Transformation introduced infinite values in continuous x-axis” Warning message: “Transformation introduced infinite values in continuous y-axis”
Warning message: “Transformation introduced infinite values in continuous x-axis” Warning message: “Transformation introduced infinite values in continuous y-axis”
Call:
lm(formula = `6bp` ~ `9bp`, data = alldel_final_input_filtered_6_9_bp)
Residuals:
Min 1Q Median 3Q Max
-2105.26 -40.21 -1.25 39.43 846.98
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.434942 2.903217 3.594 0.000332 ***
`9bp` 1.005189 0.007586 132.512 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 94.73 on 2191 degrees of freedom
Multiple R-squared: 0.8891, Adjusted R-squared: 0.889
F-statistic: 1.756e+04 on 1 and 2191 DF, p-value: < 2.2e-16
In [96]:
#Supplementary Figure 3. Deep Insertion Scanning of the EV-A71 proteome
#Bar plot of 5AA input library
G_input_AA<-ggplot(allStickles_AA_final_input)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 20,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n),binwidth = 5,fun = function(X) {log10(sum(X))} )+facet_grid(factor(Seq,c("A","C","D","E","F","G","H","I","K","L","M","N","P","Q","R","S","T","V","W","Y"))~.)+ facet_wrap(~Seq, ncol = 5)+
theme_classic()+ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(77,157,235,314,392,470,548,626,706,784,862,940,1018,1096,1174,1252,1330,1409,1488,1566,1645,1724,1802,1881,1958,2037,2115,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)
#Save figure
ggsave(plot = G_input_AA,filename = "G_input_AA.pdf",width = 15,height=10)
#Print the result
positions_above_1_AA <- sum(allStickles_AA_final_input$n > 1)
cat("Number of positions with count more than 1:", positions_above_1_AA/43860, "\n")
#Calculate median count for different variants
Boxplot_1AA_input <- ggplot(allStickles_AA_final_input, aes(y=n,x=Seq))+geom_boxplot(outlier.shape = NA)+ylim(0,65)+ylab("Median Variant Count")+xlab("")+theme_classic()
Boxplot_1AA_input
# Calculate median for each unique sequence
unique_Seq <- unique(allStickles_AA_final_input$Seq)
Medians <- numeric(length(unique_Seq))
for (i in 1:length(unique_Seq)) {
Medians[i] <- median(allStickles_AA_final_input$n[allStickles_AA_final_input$Seq == unique_Seq[i]])
}
# Create a data frame with unique sequences and their medians
Median_1AA_Seq <- data.frame(seq = unique_Seq, median_count = Medians)
Median_1AA_Seq
ggsave(plot = Boxplot_1AA_input,filename = "Boxplot_1AA_input.pdf",width = 3,height=2)
Saturation_1AA <- allStickles_AA_final_input %>% group_by(Seq) %>% summarize(saturation = sum(n > 1)/2193)
Saturation_1AA
Number of positions with count more than 1: 0.9763794
Warning message: “Removed 3542 rows containing non-finite values (`stat_boxplot()`).”
| seq | median_count |
|---|---|
| <chr> | <dbl> |
| A | 16 |
| C | 19 |
| D | 18 |
| E | 18 |
| F | 15 |
| G | 17 |
| H | 20 |
| I | 17 |
| K | 19 |
| L | 17 |
| M | 16 |
| N | 22 |
| P | 17 |
| Q | 18 |
| R | 18 |
| S | 19 |
| T | 19 |
| V | 16 |
| W | 15 |
| Y | 19 |
Warning message: “Removed 3542 rows containing non-finite values (`stat_boxplot()`).”
| Seq | saturation |
|---|---|
| <chr> | <dbl> |
| A | 0.9799362 |
| C | 0.9799362 |
| D | 0.9767442 |
| E | 0.9794802 |
| F | 0.9653443 |
| G | 0.9776562 |
| H | 0.9803922 |
| I | 0.9762882 |
| K | 0.9781122 |
| L | 0.9671683 |
| M | 0.9676243 |
| N | 0.9872321 |
| P | 0.9694482 |
| Q | 0.9794802 |
| R | 0.9772002 |
| S | 0.9831281 |
| T | 0.9840401 |
| V | 0.9726402 |
| W | 0.9648883 |
| Y | 0.9808482 |
In [97]:
#Supplementary Figure 4. Deep Mutational Scanning of the EV-A71 proteome
#Heatmap for DMS input
# Convert Position to a factor with breaks
breaks_inputheatmap <- seq(0, 2193, by = 250)
# Create a function to assign labels based on position
get_label_input <- function(position) {
sub_labels <- c("A", "B", "C", "D", "E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V")
sub_index <- findInterval(position, breaks_inputheatmap)
return(sub_labels[sub_index])
}
#create a colum sub for the different rows of the heatmap
Fullproteome_input_DMS_Enrich2_long$sub <- sapply(Fullproteome_input_DMS_Enrich2_long$position, get_label_input)
#Use amino acid as factors
Fullproteome_input_DMS_Enrich2_long$AminoAcid=factor(Fullproteome_input_DMS_Enrich2_long$AminoAcid,levels = c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
#Heatmap of DMS input library
DMS_Heatmap_input <- ggplot(Fullproteome_input_DMS_Enrich2_long) + geom_tile(aes(position,AminoAcid, fill = log10(count))) +
scale_fill_gradient2(limits = c(0,4),low = muted("white"), high = muted("aquamarine1"),space = "Lab",na.value = "black") +
theme_classic() + facet_wrap(sub ~ ., scales = "free_x",ncol=1) + theme(aspect.ratio = 0.1,axis.text.y = element_text(size = 3))+ theme(strip.text.x = element_blank())
DMS_Heatmap_input
#Save figure
ggsave(plot = DMS_Heatmap_input,filename = "DMS_Heatmap_input.pdf",width = 8,height=11)
In [98]:
#Supplementary Figure 5. Reproducibility of the Deep Insertion, Deletion, and Mutational Scanning experiments
#Comparing the replicates for insertional handle library: Passage 2. Scatter plots of replicates against each other.
Capsid_P2_Scores_shared
Replication_P2_Scores_shared
#Passage 2 Replication
#RepA vs RepB Passage 2 Replication
Scatter_P2_insertion_replication_RepA_RepB <- ggplot(Replication_P2_Scores_shared, aes(x = `RepA_score`, y = `RepB_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_insertion_replication_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_insertion_replication_RepA_RepB,filename = "Scatter_P2_insertion_replication_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepA_RepB_replication <- lm(`RepA_score` ~ `RepB_score`, data = Replication_P2_Scores_shared)
summary_model_P2_insertion_RepA_RepB_replication <- summary(model_P2_insertion_RepA_RepB_replication)
summary_model_P2_insertion_RepA_RepB_replication
#RepA vs RepC Passage 2 Replication
Scatter_P2_insertion_replication_RepA_RepC <- ggplot(Replication_P2_Scores_shared, aes(x = `RepA_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_insertion_replication_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_insertion_replication_RepA_RepC,filename = "Scatter_P2_insertion_replication_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepA_RepC_replication <- lm(`RepA_score` ~ `RepC_score`, data = Replication_P2_Scores_shared)
summary_model_P2_insertion_RepA_RepC_replication <- summary(model_P2_insertion_RepA_RepC_replication)
summary_model_P2_insertion_RepA_RepC_replication
#RepB vs RepC Passage 2 Replication
Scatter_P2_insertion_replication_RepB_RepC <- ggplot(Replication_P2_Scores_shared, aes(x = `RepB_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_insertion_replication_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_insertion_replication_RepB_RepC,filename = "Scatter_P2_insertion_replication_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepB_RepC_replication <- lm(`RepB_score` ~ `RepC_score`, data = Replication_P2_Scores_shared)
summary_model_P2_insertion_RepB_RepC_replication <- summary(model_P2_insertion_RepB_RepC_replication)
summary_model_P2_insertion_RepB_RepC_replication
#Passage 2 Capsid
#RepA vs RepB Passage 2 Capsid
Scatter_P2_insertion_RepA_RepB <- ggplot(Capsid_P2_Scores_shared, aes(x = `RepA_score`, y = `RepB_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_insertion_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_insertion_RepA_RepB,filename = "Scatter_P2_insertion_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepA_RepB <- lm(`RepA_score` ~ `RepB_score`, data = Capsid_P2_Scores_shared)
summary_model_P2_insertion_RepA_RepB <- summary(model_P2_insertion_RepA_RepB)
summary_model_P2_insertion_RepA_RepB
#RepA vs RepC Passage 2 Capsid
Scatter_P2_insertion_RepA_RepC <- ggplot(Capsid_P2_Scores_shared, aes(x = `RepA_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_insertion_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_insertion_RepA_RepC,filename = "Scatter_P2_insertion_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepA_RepC <- lm(`RepA_score` ~ `RepC_score`, data = Capsid_P2_Scores_shared)
summary_model_P2_insertion_RepA_RepC <- summary(model_P2_insertion_RepA_RepC)
summary_model_P2_insertion_RepA_RepC
#RepB vs RepC Passage 2 Capsid
Scatter_P2_insertion_RepB_RepC <- ggplot(Capsid_P2_Scores_shared, aes(x = `RepB_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_insertion_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_insertion_RepB_RepC,filename = "Scatter_P2_insertion_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepB_RepC <- lm(`RepB_score` ~ `RepC_score`, data = Capsid_P2_Scores_shared)
summary_model_P2_insertion_RepB_RepC <- summary(model_P2_insertion_RepB_RepC)
summary_model_P2_insertion_RepB_RepC
| insertion | RepA_score | RepB_score | RepC_score | indel | dataset |
|---|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <chr> |
| p.(1handle) | -4.955753 | -4.440084 | -4.490881 | 1 | handle |
| p.(2handle) | -5.169184 | -4.653516 | -4.704312 | 2 | handle |
| p.(3handle) | -5.054773 | -4.539105 | -4.589901 | 3 | handle |
| p.(4handle) | -4.826284 | -4.310616 | -4.361413 | 4 | handle |
| p.(5handle) | -4.684051 | -4.168383 | -4.219179 | 5 | handle |
| p.(6handle) | -5.050274 | -4.534606 | -4.585402 | 6 | handle |
| p.(7handle) | -5.027468 | -4.511800 | -4.562596 | 7 | handle |
| p.(8handle) | -4.737862 | -4.222193 | -4.272990 | 8 | handle |
| p.(9handle) | -4.859575 | -4.343906 | -4.394703 | 9 | handle |
| p.(10handle) | -4.734775 | -4.219107 | -4.269903 | 10 | handle |
| p.(11handle) | -4.670958 | -4.155290 | -4.206086 | 11 | handle |
| p.(12handle) | -4.592150 | -4.076482 | -4.127278 | 12 | handle |
| p.(13handle) | -4.577787 | -4.062119 | -4.112915 | 13 | handle |
| p.(14handle) | -4.670958 | -4.155290 | -4.206086 | 14 | handle |
| p.(15handle) | -4.522010 | -4.006342 | -4.057138 | 15 | handle |
| p.(16handle) | -4.548427 | -4.032759 | -4.083555 | 16 | handle |
| p.(17handle) | -4.323886 | -3.808218 | -3.859014 | 17 | handle |
| p.(18handle) | -4.205228 | -3.689560 | -3.740356 | 18 | handle |
| p.(19handle) | -4.246365 | -3.730697 | -3.781493 | 19 | handle |
| p.(20handle) | -4.364987 | -3.849319 | -3.900116 | 20 | handle |
| p.(21handle) | -4.231136 | -3.715468 | -3.766265 | 21 | handle |
| p.(22handle) | -4.194674 | -3.679006 | -3.729802 | 22 | handle |
| p.(23handle) | -4.475035 | -3.959367 | -4.010163 | 23 | handle |
| p.(24handle) | -4.070571 | -3.554903 | -3.605699 | 24 | handle |
| p.(25handle) | -4.281022 | -3.765354 | -3.816150 | 25 | handle |
| p.(26handle) | -4.404466 | -3.888798 | -3.939594 | 26 | handle |
| p.(27handle) | -4.076559 | -3.560891 | -3.611688 | 27 | handle |
| p.(28handle) | -3.928856 | -3.413187 | -3.463984 | 28 | handle |
| p.(29handle) | -4.145751 | -3.630083 | -3.680879 | 29 | handle |
| p.(30handle) | -3.976162 | -3.460494 | -3.511290 | 30 | handle |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| p.(833handle) | -3.8571402 | -3.3414722 | -3.3922684 | 833 | handle |
| p.(834handle) | -3.9007835 | -3.3851154 | -3.4359117 | 834 | handle |
| p.(835handle) | -4.0943113 | -3.5786433 | -3.6294395 | 835 | handle |
| p.(836handle) | -4.4044662 | -3.8887982 | -3.9395944 | 836 | handle |
| p.(837handle) | -4.4830190 | -3.9673510 | -4.0181472 | 837 | handle |
| p.(838handle) | -4.1457512 | -3.6300832 | -3.6808794 | 838 | handle |
| p.(839handle) | -4.0943113 | -3.5786433 | -3.6294395 | 839 | handle |
| p.(840handle) | -4.6098181 | -4.0941500 | -4.1449462 | 840 | handle |
| p.(841handle) | -4.3331666 | -3.8174985 | -3.8682947 | 841 | handle |
| p.(842handle) | -4.1345466 | -3.6188786 | -3.6696748 | 842 | handle |
| p.(843handle) | -4.1513068 | -3.6356387 | -3.6864350 | 843 | handle |
| p.(844handle) | -4.2052282 | -3.6895602 | -3.7403564 | 844 | handle |
| p.(845handle) | -4.5521444 | -4.0364764 | -4.0872726 | 845 | handle |
| p.(846handle) | -4.4629380 | -3.9472700 | -3.9980662 | 846 | handle |
| p.(847handle) | -4.4215242 | -3.9058562 | -3.9566524 | 847 | handle |
| p.(848handle) | -4.1786309 | -3.6629628 | -3.7137591 | 848 | handle |
| p.(849handle) | -3.9288555 | -3.4131875 | -3.4639837 | 849 | handle |
| p.(850handle) | -3.7472258 | -3.2315577 | -3.2823539 | 850 | handle |
| p.(851handle) | -3.2111887 | -2.6955207 | -2.7463169 | 851 | handle |
| p.(852handle) | -3.7878818 | -3.2722137 | -3.3230099 | 852 | handle |
| p.(853handle) | -4.4341294 | -3.9184614 | -3.9692576 | 853 | handle |
| p.(854handle) | -4.3514737 | -3.8358057 | -3.8866019 | 854 | handle |
| p.(855handle) | -3.7718172 | -3.2561491 | -3.3069453 | 855 | handle |
| p.(856handle) | -3.7388924 | -3.2232243 | -3.2740206 | 856 | handle |
| p.(857handle) | -4.3827263 | -3.8670582 | -3.9178544 | 857 | handle |
| p.(858handle) | -4.0705713 | -3.5549033 | -3.6056995 | 858 | handle |
| p.(859handle) | 1.5953199 | 0.5512497 | 0.8540936 | 859 | handle |
| p.(860handle) | 5.9323323 | 4.6930604 | 4.8115587 | 860 | handle |
| p.(861handle) | 5.8091086 | 5.1633204 | 5.7076631 | 861 | handle |
| p.(862handle) | -0.5538406 | -3.3340094 | -3.3848057 | 862 | handle |
| insertion | RepA_score | RepB_score | RepC_score | indel | dataset |
|---|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <chr> |
| p.(863handle) | 5.733648 | 5.706546 | 5.5451139 | 863 | handle |
| p.(864handle) | 5.570286 | 5.669916 | 5.6042691 | 864 | handle |
| p.(865handle) | 5.273083 | 5.358577 | 5.4957911 | 865 | handle |
| p.(866handle) | 5.165721 | 5.183828 | 5.2658354 | 866 | handle |
| p.(867handle) | 4.962356 | 5.085517 | 4.7423284 | 867 | handle |
| p.(868handle) | 4.653415 | 4.296234 | 4.7650212 | 868 | handle |
| p.(869handle) | 4.074360 | 3.999124 | 4.4920633 | 869 | handle |
| p.(870handle) | 3.562570 | 3.512570 | 3.5321248 | 870 | handle |
| p.(871handle) | 1.660696 | 1.238419 | 1.5909884 | 871 | handle |
| p.(872handle) | -5.715586 | -5.822991 | -5.8704747 | 872 | handle |
| p.(873handle) | 1.264993 | 1.043876 | 0.2967718 | 873 | handle |
| p.(874handle) | -4.919616 | -5.027021 | -5.0745048 | 874 | handle |
| p.(875handle) | -4.749205 | -4.856610 | -4.9040936 | 875 | handle |
| p.(876handle) | -4.749205 | -4.856610 | -4.9040936 | 876 | handle |
| p.(877handle) | -5.088795 | -5.196200 | -5.2436835 | 877 | handle |
| p.(878handle) | -4.882053 | -4.989458 | -5.0369417 | 878 | handle |
| p.(879handle) | -5.163737 | -5.271141 | -5.3186249 | 879 | handle |
| p.(880handle) | -5.032780 | -5.140185 | -5.1876689 | 880 | handle |
| p.(881handle) | -4.919616 | -5.027021 | -5.0745048 | 881 | handle |
| p.(882handle) | -5.163737 | -5.271141 | -5.3186249 | 882 | handle |
| p.(883handle) | -4.901011 | -5.008416 | -5.0558996 | 883 | handle |
| p.(884handle) | -4.781466 | -4.888870 | -4.9363544 | 884 | handle |
| p.(885handle) | -4.955819 | -5.063224 | -5.1107078 | 885 | handle |
| p.(886handle) | -4.669618 | -4.777022 | -4.8245061 | 886 | handle |
| p.(887handle) | -4.843024 | -4.950428 | -4.9979123 | 887 | handle |
| p.(888handle) | -4.669618 | -4.777022 | -4.8245061 | 888 | handle |
| p.(889handle) | -4.928791 | -5.036195 | -5.0836791 | 889 | handle |
| p.(890handle) | -4.738216 | -4.845620 | -4.8931044 | 890 | handle |
| p.(891handle) | -4.595965 | -4.703369 | -4.7508532 | 891 | handle |
| p.(892handle) | -4.399531 | -4.506936 | -4.5544198 | 892 | handle |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| p.(2164handle) | -5.049107 | -5.156512 | -5.203996 | 2164 | handle |
| p.(2165handle) | -5.149191 | -5.256595 | -5.304079 | 2165 | handle |
| p.(2166handle) | -4.760075 | -4.867479 | -4.914963 | 2166 | handle |
| p.(2167handle) | -4.964669 | -5.072073 | -5.119557 | 2167 | handle |
| p.(2168handle) | -5.156490 | -5.263895 | -5.311379 | 2168 | handle |
| p.(2169handle) | -4.621123 | -4.728528 | -4.776012 | 2169 | handle |
| p.(2170handle) | -5.104239 | -5.211644 | -5.259128 | 2170 | handle |
| p.(2171handle) | -5.080983 | -5.188387 | -5.235871 | 2171 | handle |
| p.(2172handle) | -5.088795 | -5.196200 | -5.243683 | 2172 | handle |
| p.(2173handle) | -4.681383 | -4.788787 | -4.836271 | 2173 | handle |
| p.(2174handle) | -5.096547 | -5.203951 | -5.251435 | 2174 | handle |
| p.(2175handle) | -4.937882 | -5.045286 | -5.092770 | 2175 | handle |
| p.(2176handle) | -5.049107 | -5.156512 | -5.203996 | 2176 | handle |
| p.(2177handle) | -5.096547 | -5.203951 | -5.251435 | 2177 | handle |
| p.(2178handle) | -5.119449 | -5.226853 | -5.274337 | 2178 | handle |
| p.(2179handle) | -5.185166 | -5.292570 | -5.340054 | 2179 | handle |
| p.(2180handle) | -5.126968 | -5.234372 | -5.281856 | 2180 | handle |
| p.(2181handle) | -5.016183 | -5.123587 | -5.171071 | 2181 | handle |
| p.(2182handle) | -5.065172 | -5.172576 | -5.220060 | 2182 | handle |
| p.(2183handle) | -4.919616 | -5.027021 | -5.074505 | 2183 | handle |
| p.(2184handle) | -5.134430 | -5.241835 | -5.289319 | 2184 | handle |
| p.(2185handle) | -4.982137 | -5.089541 | -5.137025 | 2185 | handle |
| p.(2186handle) | -5.156490 | -5.263895 | -5.311379 | 2186 | handle |
| p.(2187handle) | -4.982137 | -5.089541 | -5.137025 | 2187 | handle |
| p.(2188handle) | -5.292292 | -5.399696 | -5.447180 | 2188 | handle |
| p.(2189handle) | -5.065172 | -5.172576 | -5.220060 | 2189 | handle |
| p.(2190handle) | -5.080983 | -5.188387 | -5.235871 | 2190 | handle |
| p.(2191handle) | -4.862729 | -4.970133 | -5.017617 | 2191 | handle |
| p.(2192handle) | -4.862729 | -4.970133 | -5.017617 | 2192 | handle |
| p.(2193handle) | -4.193679 | -4.301084 | -4.348568 | 2193 | handle |
Call:
lm(formula = RepA_score ~ RepB_score, data = Replication_P2_Scores_shared)
Residuals:
Min 1Q Median 3Q Max
-3.6173 -0.0177 -0.0146 -0.0091 4.1874
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.032423 0.029248 1.109 0.268
RepB_score 0.983905 0.005388 182.620 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2713 on 1329 degrees of freedom
Multiple R-squared: 0.9617, Adjusted R-squared: 0.9616
F-statistic: 3.335e+04 on 1 and 1329 DF, p-value: < 2.2e-16
Call:
lm(formula = RepA_score ~ RepC_score, data = Replication_P2_Scores_shared)
Residuals:
Min 1Q Median 3Q Max
-3.1609 -0.0158 -0.0116 -0.0044 3.2879
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04727 0.02589 1.826 0.0681 .
RepC_score 0.97876 0.00473 206.917 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2405 on 1329 degrees of freedom
Multiple R-squared: 0.9699, Adjusted R-squared: 0.9699
F-statistic: 4.281e+04 on 1 and 1329 DF, p-value: < 2.2e-16
Call:
lm(formula = RepB_score ~ RepC_score, data = Replication_P2_Scores_shared)
Residuals:
Min 1Q Median 3Q Max
-3.7130 -0.0092 -0.0040 0.0052 3.6628
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.098954 0.027692 -3.573 0.000365 ***
RepC_score 0.973222 0.005059 192.363 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2572 on 1329 degrees of freedom
Multiple R-squared: 0.9653, Adjusted R-squared: 0.9653
F-statistic: 3.7e+04 on 1 and 1329 DF, p-value: < 2.2e-16
Call:
lm(formula = RepA_score ~ RepB_score, data = Capsid_P2_Scores_shared)
Residuals:
Min 1Q Median 3Q Max
-7.0007 -0.0292 -0.0246 -0.0208 6.9329
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.55265 0.06189 -8.93 <2e-16 ***
RepB_score 0.98417 0.01583 62.18 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4581 on 860 degrees of freedom
Multiple R-squared: 0.818, Adjusted R-squared: 0.8178
F-statistic: 3866 on 1 and 860 DF, p-value: < 2.2e-16
Call:
lm(formula = RepA_score ~ RepC_score, data = Capsid_P2_Scores_shared)
Residuals:
Min 1Q Median 3Q Max
-5.0037 -0.0270 -0.0004 0.0224 4.5160
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.82831 0.05718 -14.49 <2e-16 ***
RepC_score 0.90780 0.01449 62.66 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4553 on 860 degrees of freedom
Multiple R-squared: 0.8203, Adjusted R-squared: 0.8201
F-statistic: 3926 on 1 and 860 DF, p-value: < 2.2e-16
Call:
lm(formula = RepB_score ~ RepC_score, data = Capsid_P2_Scores_shared)
Residuals:
Min 1Q Median 3Q Max
-5.6998 -0.0273 0.0150 0.0506 2.3964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.53696 0.04620 -11.62 <2e-16 ***
RepC_score 0.85479 0.01171 73.02 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3678 on 860 degrees of freedom
Multiple R-squared: 0.8611, Adjusted R-squared: 0.861
F-statistic: 5332 on 1 and 860 DF, p-value: < 2.2e-16
In [99]:
#Comparing the replicates for deletion handle library: Passage 2 Scatter plots of replicates against each other.
#Passage 2 Replication shared Deletion
#RepA vs RepB Passage 2 Replication
Scatter_P2_deletion__Replication_RepA_RepB <- ggplot(Replication_P2_Deletion_shared, aes(x = `RepA_score`, y = `RepB_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_deletion__Replication_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_deletion__Replication_RepA_RepB,filename = "Scatter_P2_deletion__Replication_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_replication_RepA_RepB <- lm(`RepA_score` ~ `RepB_score`, data = Replication_P2_Deletion_shared)
summary_model_P2_deletion_replication_RepA_RepB <- summary(model_P2_deletion_replication_RepA_RepB)
summary_model_P2_deletion_replication_RepA_RepB
#RepA vs RepC Passage 2 Replication
Scatter_P2_deletion__Replication_RepA_RepC <- ggplot(Replication_P2_Deletion_shared, aes(x = `RepA_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_deletion__Replication_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_deletion__Replication_RepA_RepC,filename = "Scatter_P2_deletion__Replication_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_replication_RepA_RepC <- lm(`RepA_score` ~ `RepC_score`, data = Replication_P2_Deletion_shared)
summary_model_P2_deletion_replication_RepA_RepC <- summary(model_P2_deletion_replication_RepA_RepC)
summary_model_P2_deletion_replication_RepA_RepC
#RepB vs RepC Passage 2 Replication
Scatter_P2_deletion__Replication_RepB_RepC <- ggplot(Replication_P2_Deletion_shared, aes(x = `RepB_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_deletion__Replication_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_deletion__Replication_RepB_RepC,filename = "Scatter_P2_deletion__Replication_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_replication_RepB_RepC <- lm(`RepB_score` ~ `RepC_score`, data = Replication_P2_Deletion_shared)
summary_model_P2_deletion_replication_RepB_RepC <- summary(model_P2_deletion_replication_RepB_RepC)
summary_model_P2_deletion_replication_RepB_RepC
#Passage 2 Capsid shared Deletion
#RepA vs RepB Passage 2 Capsid
Scatter_P2_deletion_RepA_RepB <- ggplot(Capsid_P2_Deletions_shared, aes(x = `RepA_score`, y = `RepB_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_deletion_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_deletion_RepA_RepB,filename = "Scatter_P2_deletion_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_RepA_RepB <- lm(`RepA_score` ~ `RepB_score`, data = Capsid_P2_Deletions_shared)
summary_model_P2_deletion_RepA_RepB <- summary(model_P2_deletion_RepA_RepB)
summary_model_P2_deletion_RepA_RepB
#RepA vs RepC Passage 2 Capsid
Scatter_P2_deletion_RepA_RepC <- ggplot(Capsid_P2_Deletions_shared, aes(x = `RepA_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_deletion_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_deletion_RepA_RepC,filename = "Scatter_P2_deletion_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_RepA_RepC <- lm(`RepA_score` ~ `RepC_score`, data = Capsid_P2_Deletions_shared)
summary_model_P2_deletion_RepA_RepC <- summary(model_P2_deletion_RepA_RepC)
summary_model_P2_deletion_RepA_RepC
#RepB vs RepC Passage 2 Capsid
Scatter_P2_deletion_RepB_RepC <- ggplot(Capsid_P2_Deletions_shared, aes(x = `RepB_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_deletion_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_deletion_RepB_RepC,filename = "Scatter_P2_deletion_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_RepB_RepC <- lm(`RepB_score` ~ `RepC_score`, data = Capsid_P2_Deletions_shared)
summary_model_P2_deletion_RepB_RepC <- summary(model_P2_deletion_RepB_RepC)
summary_model_P2_deletion_RepB_RepC
Warning message: “Removed 1 rows containing missing values (`geom_point()`).” Warning message: “Removed 1 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepA_score ~ RepB_score, data = Replication_P2_Deletion_shared)
Residuals:
Min 1Q Median 3Q Max
-3.9004 -0.0175 -0.0024 0.0173 3.4109
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.127498 0.017498 -7.286 3.82e-13 ***
RepB_score 0.935870 0.003587 260.942 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2241 on 3959 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.9451, Adjusted R-squared: 0.945
F-statistic: 6.809e+04 on 1 and 3959 DF, p-value: < 2.2e-16
Warning message: “Removed 1 rows containing missing values (`geom_point()`).”
Warning message: “Removed 1 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepA_score ~ RepC_score, data = Replication_P2_Deletion_shared)
Residuals:
Min 1Q Median 3Q Max
-3.9536 -0.0180 -0.0052 0.0116 3.3866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.041769 0.017104 2.442 0.0146 *
RepC_score 0.945384 0.003417 276.683 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.212 on 3959 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.9508, Adjusted R-squared: 0.9508
F-statistic: 7.655e+04 on 1 and 3959 DF, p-value: < 2.2e-16
Warning message: “Removed 1 rows containing missing values (`geom_point()`).”
Warning message: “Removed 1 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepB_score ~ RepC_score, data = Replication_P2_Deletion_shared)
Residuals:
Min 1Q Median 3Q Max
-3.6258 -0.0172 -0.0100 -0.0002 4.2126
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.021357 0.021839 -0.978 0.328
RepC_score 0.968960 0.004363 222.102 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2707 on 3959 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.9257, Adjusted R-squared: 0.9257
F-statistic: 4.933e+04 on 1 and 3959 DF, p-value: < 2.2e-16
Warning message: “Removed 17 rows containing missing values (`geom_point()`).”
Warning message: “Removed 17 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepA_score ~ RepB_score, data = Capsid_P2_Deletions_shared)
Residuals:
Min 1Q Median 3Q Max
-3.8571 -0.0107 0.0376 0.0802 3.1709
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.739523 0.015342 48.2 <2e-16 ***
RepB_score 0.795080 0.006856 116.0 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3094 on 2471 degrees of freedom
(17 observations deleted due to missingness)
Multiple R-squared: 0.8448, Adjusted R-squared: 0.8447
F-statistic: 1.345e+04 on 1 and 2471 DF, p-value: < 2.2e-16
Warning message: “Removed 17 rows containing missing values (`geom_point()`).”
Warning message: “Removed 17 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepA_score ~ RepC_score, data = Capsid_P2_Deletions_shared)
Residuals:
Min 1Q Median 3Q Max
-3.5576 -0.0116 0.0522 0.1084 3.1848
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.694602 0.018359 37.83 <2e-16 ***
RepC_score 0.729229 0.007752 94.06 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3669 on 2471 degrees of freedom
(17 observations deleted due to missingness)
Multiple R-squared: 0.7817, Adjusted R-squared: 0.7816
F-statistic: 8848 on 1 and 2471 DF, p-value: < 2.2e-16
Warning message: “Removed 17 rows containing missing values (`geom_point()`).”
Warning message: “Removed 17 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepB_score ~ RepC_score, data = Capsid_P2_Deletions_shared)
Residuals:
Min 1Q Median 3Q Max
-3.4108 -0.0115 0.0172 0.0434 4.6884
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.143661 0.017825 -8.06 1.18e-15 ***
RepC_score 0.876981 0.007527 116.51 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3563 on 2471 degrees of freedom
(17 observations deleted due to missingness)
Multiple R-squared: 0.846, Adjusted R-squared: 0.8459
F-statistic: 1.358e+04 on 1 and 2471 DF, p-value: < 2.2e-16
In [100]:
#Passage 2 Replication shared DMS
#RepA vs RepB Passage 2 Replication
Scatter_P2_DMS__Replication_RepA_RepB <- ggplot(Replication_P2_DMS_Enrich2_shared, aes(x = `RepA_DMS_score`, y = `RepB_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_DMS__Replication_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_DMS__Replication_RepA_RepB,filename = "Scatter_P2_DMS__Replication_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_replication_RepA_RepB <- lm(`RepA_DMS_score` ~ `RepB_DMS_score`, data = Replication_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_replication_RepA_RepB <- summary(model_P2_DMS_replication_RepA_RepB)
summary_model_P2_DMS_replication_RepA_RepB
#RepA vs RepC Passage 2 Replication
Scatter_P2_DMS__Replication_RepA_RepC <- ggplot(Replication_P2_DMS_Enrich2_shared, aes(x = `RepA_DMS_score`, y = `RepC_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_DMS__Replication_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_DMS__Replication_RepA_RepC,filename = "Scatter_P2_DMS__Replication_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_replication_RepA_RepC <- lm(`RepA_DMS_score` ~ `RepC_DMS_score`, data = Replication_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_replication_RepA_RepC <- summary(model_P2_DMS_replication_RepA_RepC)
summary_model_P2_DMS_replication_RepA_RepC
#RepB vs RepC Passage 2 Replication
Scatter_P2_DMS__Replication_RepB_RepC <- ggplot(Replication_P2_DMS_Enrich2_shared, aes(x = `RepB_DMS_score`, y = `RepC_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_DMS__Replication_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_DMS__Replication_RepB_RepC,filename = "Scatter_P2_DMS__Replication_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_replication_RepB_RepC <- lm(`RepB_DMS_score` ~ `RepC_DMS_score`, data = Replication_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_replication_RepB_RepC <- summary(model_P2_DMS_replication_RepB_RepC)
summary_model_P2_DMS_replication_RepB_RepC
#Passage 2 Capsid shared DMS
#RepA vs RepB Passage 2 Capsid
Scatter_P2_DMS__Capsid_RepA_RepB <- ggplot(Capsid_P2_DMS_Enrich2_shared, aes(x = `RepA_DMS_score`, y = `RepB_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_DMS__Capsid_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_DMS__Capsid_RepA_RepB,filename = "Scatter_P2_DMS__Capsid_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_capsid_RepA_RepB <- lm(`RepA_DMS_score` ~ `RepB_DMS_score`, data = Capsid_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_capsid_RepA_RepB <- summary(model_P2_DMS_capsid_RepA_RepB)
summary_model_P2_DMS_capsid_RepA_RepB
#RepA vs RepC Passage 2 Capsid
Scatter_P2_DMS__Capsid_RepA_RepC <- ggplot(Capsid_P2_DMS_Enrich2_shared, aes(x = `RepA_DMS_score`, y = `RepC_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_DMS__Capsid_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_DMS__Capsid_RepA_RepC,filename = "Scatter_P2_DMS__Capsid_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_capsid_RepA_RepC <- lm(`RepA_DMS_score` ~ `RepC_DMS_score`, data = Capsid_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_capsid_RepA_RepC <- summary(model_P2_DMS_capsid_RepA_RepC)
summary_model_P2_DMS_capsid_RepA_RepC
#RepB vs RepC Passage 2 Capsid
Scatter_P2_DMS__Capsid_RepB_RepC <- ggplot(Capsid_P2_DMS_Enrich2_shared, aes(x = `RepB_DMS_score`, y = `RepC_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_DMS__Capsid_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_DMS__Capsid_RepB_RepC,filename = "Scatter_P2_DMS__Capsid_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_capsid_RepB_RepC <- lm(`RepB_DMS_score` ~ `RepC_DMS_score`, data = Capsid_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_capsid_RepB_RepC <- summary(model_P2_DMS_capsid_RepB_RepC)
summary_model_P2_DMS_capsid_RepB_RepC
Call:
lm(formula = RepA_DMS_score ~ RepB_DMS_score, data = Replication_P2_DMS_Enrich2_shared)
Residuals:
Min 1Q Median 3Q Max
-5.0238 -0.3853 0.0827 0.4296 4.6959
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.171714 0.007222 -23.78 <2e-16 ***
RepB_DMS_score 0.904960 0.002336 387.33 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8165 on 25287 degrees of freedom
Multiple R-squared: 0.8558, Adjusted R-squared: 0.8558
F-statistic: 1.5e+05 on 1 and 25287 DF, p-value: < 2.2e-16
Call:
lm(formula = RepA_DMS_score ~ RepC_DMS_score, data = Replication_P2_DMS_Enrich2_shared)
Residuals:
Min 1Q Median 3Q Max
-3.9821 -0.3803 0.0943 0.4233 4.1366
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.227119 0.006812 -33.34 <2e-16 ***
RepC_DMS_score 1.067687 0.002627 406.37 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7835 on 25287 degrees of freedom
Multiple R-squared: 0.8672, Adjusted R-squared: 0.8672
F-statistic: 1.651e+05 on 1 and 25287 DF, p-value: < 2.2e-16
Call:
lm(formula = RepB_DMS_score ~ RepC_DMS_score, data = Replication_P2_DMS_Enrich2_shared)
Residuals:
Min 1Q Median 3Q Max
-3.8821 -0.3909 0.0810 0.4291 4.6228
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.221552 0.007011 -31.6 <2e-16 ***
RepC_DMS_score 1.090281 0.002704 403.2 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8063 on 25287 degrees of freedom
Multiple R-squared: 0.8654, Adjusted R-squared: 0.8654
F-statistic: 1.626e+05 on 1 and 25287 DF, p-value: < 2.2e-16
Call:
lm(formula = RepA_DMS_score ~ RepB_DMS_score, data = Capsid_P2_DMS_Enrich2_shared)
Residuals:
Min 1Q Median 3Q Max
-4.6017 -0.3444 0.0706 0.3789 4.1668
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.149415 0.009167 -16.3 <2e-16 ***
RepB_DMS_score 0.940441 0.002738 343.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7283 on 16376 degrees of freedom
Multiple R-squared: 0.8781, Adjusted R-squared: 0.8781
F-statistic: 1.18e+05 on 1 and 16376 DF, p-value: < 2.2e-16
Call:
lm(formula = RepA_DMS_score ~ RepC_DMS_score, data = Capsid_P2_DMS_Enrich2_shared)
Residuals:
Min 1Q Median 3Q Max
-4.4147 -0.4412 0.0611 0.3973 5.5852
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.210338 0.010317 -20.39 <2e-16 ***
RepC_DMS_score 0.946228 0.003172 298.30 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8223 on 16376 degrees of freedom
Multiple R-squared: 0.8446, Adjusted R-squared: 0.8446
F-statistic: 8.898e+04 on 1 and 16376 DF, p-value: < 2.2e-16
Call:
lm(formula = RepB_DMS_score ~ RepC_DMS_score, data = Capsid_P2_DMS_Enrich2_shared)
Residuals:
Min 1Q Median 3Q Max
-5.3287 -0.4338 0.0209 0.3516 5.7200
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.201484 0.009692 -20.79 <2e-16 ***
RepC_DMS_score 0.952428 0.002980 319.62 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7725 on 16376 degrees of freedom
Multiple R-squared: 0.8618, Adjusted R-squared: 0.8618
F-statistic: 1.022e+05 on 1 and 16376 DF, p-value: < 2.2e-16
In [101]:
#Figure 2. Fitness effects of insertions, deletions, and amino acid changes in EV-A71
#Bar plot of Insertional Handle Enrich2 P2
Scores_Insertional_Handle_Fullproteome_Plot <- ggplot()+stat_summary_bin(data = Scores_Insertional_Handle_Fullproteome, show.legend = F, geom = "bar", aes(indel, score), binwidth =1,fill="#1558a9")+
geom_rect(data = EV_Features, aes(xmin=(Start-746)/3,xmax=(End-746)/3, ymin = 0, ymax = 8), alpha = rep(times = 1, c(0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0.1, 0)))+
theme_classic()+coord_cartesian(xlim=c(0,2193),ylim=c(-6,8))+xlab("Residue Position")+ylab("Enrich2 Score")+geom_hline(col='darkgrey', lwd=0.75, yintercept=0)
ggsave(plot = Scores_Insertional_Handle_Fullproteome_Plot,filename = "Scores_Insertional_Handle_Fullproteome.pdf",width = 4,height=1.5)
Scores_Insertional_Handle_Fullproteome_Plot
#Bar plot of Deletion library Enrich2 P2
Scores_Deletions1AA_Fullproteome_plot <- ggplot()+stat_summary_bin(data = Scores_Deletions1AA_Fullproteome_1AA, show.legend = F, geom = "bar", aes(indel, score), binwidth =1,fill="#a91515")+
geom_rect(data = EV_Features, aes(xmin=(Start-746)/3,xmax=(End-746)/3, ymin = 0, ymax =8), alpha = rep(times = 1, c(0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0.1, 0)))+
theme_classic()+coord_cartesian(xlim=c(0,2193),ylim=c(-6,8))+xlab("Residue Position")+ylab("Enrich2 Score")+geom_hline(col='darkgrey', lwd=0.75, yintercept=0)
ggsave(plot = Scores_Deletions1AA_Fullproteome_plot,filename = "Scores_Deletions1AA_Fullproteome_plot.pdf",width = 4,height=1.5)
Scores_Deletions1AA_Fullproteome_plot
#Bar plot of DMS library Enrich2 P2
Scores_DMS_Fullproteome_plot <- ggplot()+stat_summary_bin(data = Fullproteome_P2_DMS_Enrich2_long, show.legend = F, geom = "bar", aes(position, score), binwidth =1,fill="#3c7430")+
geom_rect(data = EV_Features, aes(xmin=(Start-746)/3,xmax=(End-746)/3, ymin = 0, ymax =3), alpha = rep(times = 1, c(0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0.1, 0)))+
theme_classic()+coord_cartesian(xlim=c(0,2193),ylim=c(-5,3))+xlab("Residue Position")+ylab("Enrich2 Score")+geom_hline(col='darkgrey', lwd=0.75, yintercept=0)
ggsave(plot = Scores_DMS_Fullproteome_plot,filename = "Scores_DMS_Fullproteome_plot.pdf",width = 4,height=1.5)
Scores_DMS_Fullproteome_plot
No summary function supplied, defaulting to `mean_se()` No summary function supplied, defaulting to `mean_se()` Warning message: “Removed 42 rows containing non-finite values (`stat_summary_bin()`).” No summary function supplied, defaulting to `mean_se()` Warning message: “Removed 42 rows containing non-finite values (`stat_summary_bin()`).” No summary function supplied, defaulting to `mean_se()`
Warning message: “Removed 2193 rows containing non-finite values (`stat_summary_bin()`).” No summary function supplied, defaulting to `mean_se()` Warning message: “Removed 2193 rows containing non-finite values (`stat_summary_bin()`).” No summary function supplied, defaulting to `mean_se()`
In [102]:
#Figure 2: DMFE of insertions, deletions, and DMS
#Selecting only the score column and plotting a histogram for insertions
histo_insertions_sel <- select(Scores_Insertional_Handle_Fullproteome,score)
histo_insertions_sel_plot <- ggplot(histo_insertions_sel, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = ("#1558a9"), color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-6,6))
histo_insertions_sel_plot
#Save Figure
ggsave(plot = histo_insertions_sel_plot,filename = "histo_insertions_sel_plot.pdf",width = 2,height=1.5)
#Filtering df for insertions with a score higher than 0
histo_insertions_sel_nonlethal <- filter(histo_insertions_sel,score>0)
#Plotting a histogram for insertions with a score higher than 0
histo_insertions_sel_nonlethal_plot <- ggplot(histo_insertions_sel_nonlethal, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = ("#1558a9"), color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
histo_insertions_sel_nonlethal_plot
#Save Figure
ggsave(plot = histo_insertions_sel_nonlethal_plot,filename = "histo_insertions_sel_nonlethal_plot.pdf",width = 2,height=1.5)
#Selecting only the score column and plotting a histogram for deletions
histo_del_sel <- select(Scores_Deletions_Fullproteome,score)
del_histogram <- ggplot(histo_del_sel, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#a91515", color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-6,6))
del_histogram
#Save Figure
ggsave(plot = del_histogram,filename = "del_histogram.pdf",width = 2,height=1.5)
#Filtering df for deletions with a score higher than 0
histo_del_sel_nonlethal <- filter(histo_del_sel,score>0)
#Plotting a histogram for deletions with a score higher than 0
del_histogram_nonlethal <- ggplot(histo_del_sel_nonlethal, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#a91515", color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
del_histogram_nonlethal
#Save Figure
ggsave(plot = del_histogram_nonlethal,filename = "del_histogram_nonlethal.pdf",width = 2,height=1.5)
#Selecting only the score column and plotting a histogram for DMS
histo_DMS_sel <- select(Fullproteome_P2_DMS_Enrich2_long,score)
histo_DMS_sel_plot <- ggplot(histo_DMS_sel, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = ("#3c7430"), color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-6,6))
histo_DMS_sel_plot
#Save Figure
ggsave(plot = histo_DMS_sel_plot,filename = "histo_DMS_sel_plot.pdf",width = 2,height=1.5)
#Filtering df for DMS with a score higher than 0
histo_DMS_sel_nonlethal <- filter(histo_DMS_sel,score>0)
#Plotting a histogram for DMS with a score higher than 0
histo_DMS_sel_plot_nonlethal_plot <- ggplot(histo_DMS_sel_nonlethal, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = ("#3c7430"), color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
histo_DMS_sel_plot_nonlethal_plot
#Save Figure
ggsave(plot = histo_DMS_sel_plot_nonlethal_plot,filename = "histo_DMS_sel_plot_nonlethal.pdf",width = 2,height=1.5)
Warning message: “Removed 145 rows containing non-finite values (`stat_bin()`).”
Warning message: “Removed 145 rows containing non-finite values (`stat_bin()`).”
Warning message: “Removed 2193 rows containing non-finite values (`stat_bin()`).”
Warning message: “Removed 2193 rows containing non-finite values (`stat_bin()`).”
In [103]:
##enrich2:
## enrich2<0: low
## 0<enrich2<2 : mid
## enrich2>2: high
#Figure 2: Classification of variants into different enrich2 class scores, and comparing distribution for different viral proteins
# Create a mapping of positions to viral proteins
Viral_protein_Mapping <- data.frame(
start = c(1, 70, 324,566,863,1013,1112,1441,1527,1549,1732),
end = c(69, 323, 565,862,1012,1111,1440,1526,1548,1731,2193),
viralprotein = c("VP4", "VP2", "VP3","VP1","2A","2B","2C","3A","3B","3C","3D"))
#Setting colors for different enrich2 classes
fitness_class_colors <- c( "#202E38","#7B9CAF","#9DCCED")
fitness_class_colors_del <- c( "#441E1E","#BF7B7B","#FFD9D9")
fitness_class_colors_dms <- c( "#285e1b","#529143","#a6f492")
#Dataframe for insertional handle for comparison of enrich2 classes
Handle_P2_Fitness_Class <- Scores_Insertional_Handle_Fullproteome %>% mutate(viralprotein = sapply(indel, function(pos) {
matching_row <- which(pos >= Viral_protein_Mapping$start & pos <= Viral_protein_Mapping$end)
if (length(matching_row) > 0) {return(Viral_protein_Mapping$viralprotein[matching_row])} else {return("No Protein")}})
)
#Adding scores classes depending on the enrich2 scores and plotting the area plot
Handle_P2_Fitness_Class <- na.omit(Handle_P2_Fitness_Class)
Handle_P2_Fitness_Class$Class="N"
Handle_P2_Fitness_Class[Handle_P2_Fitness_Class$score<2,]$Class="N"
Handle_P2_Fitness_Class[Handle_P2_Fitness_Class$score<0,]$Class="L"
Handle_P2_Fitness_Class[Handle_P2_Fitness_Class$score>2,]$Class="B"
Handle_P2_Fitness_Class_counts <- table(Handle_P2_Fitness_Class$Class,Handle_P2_Fitness_Class$viralprotein)
chisq.test(Handle_P2_Fitness_Class_counts)
Handle_P2_Fitness_Class$viralprotein <- factor(Handle_P2_Fitness_Class$viralprotein, levels = c("VP4", "VP2", "VP3","VP1","2A","2B","2C","3A","3B","3C","3D"))
Handle_P2_Fitness_Class$Class <- factor(Handle_P2_Fitness_Class$Class, levels = c("B", "N","L"))
plot(as.factor(Handle_P2_Fitness_Class$viralprotein),as.factor(Handle_P2_Fitness_Class$Class),ylim=c(0.8,1.0),col = fitness_class_colors,lwd = 0.6)
pdf("Insertion_Fitness_Class.pdf",width = 4, height = 4)
plot(as.factor(Handle_P2_Fitness_Class$viralprotein),as.factor(Handle_P2_Fitness_Class$Class),ylim=c(0.8,1.0),col = fitness_class_colors,lwd = 0.6)
dev.off()
#Dataframe for deletion library for comparison of enrich2 classes
Deletion3bp_P2_Fitness_Class <- Scores_Deletions1AA_Fullproteome_1AA %>% mutate(viralprotein = sapply(indel, function(pos) {
matching_row <- which(pos >= Viral_protein_Mapping$start & pos <= Viral_protein_Mapping$end)
if (length(matching_row) > 0) {return(Viral_protein_Mapping$viralprotein[matching_row])} else {return("No Protein")}})
)
#Adding scores classes depending on the enrich2 scores and plotting the area plot
Deletion3bp_P2_Fitness_Class <- na.omit(Deletion3bp_P2_Fitness_Class)
Deletion3bp_P2_Fitness_Class$Class="N"
Deletion3bp_P2_Fitness_Class[Deletion3bp_P2_Fitness_Class$score<2,]$Class="N"
Deletion3bp_P2_Fitness_Class[Deletion3bp_P2_Fitness_Class$score<0,]$Class="L"
Deletion3bp_P2_Fitness_Class[Deletion3bp_P2_Fitness_Class$score>2,]$Class="B"
Deletion3bp_P2_Fitness_Class_Counts <- table(Deletion3bp_P2_Fitness_Class$Class,Deletion3bp_P2_Fitness_Class$viralprotein)
chisq.test(Deletion3bp_P2_Fitness_Class_Counts)
Deletion3bp_P2_Fitness_Class$viralprotein <- factor(Deletion3bp_P2_Fitness_Class$viralprotein, levels = c("VP4", "VP2", "VP3","VP1","2A","2B","2C","3A","3B","3C","3D"))
Deletion3bp_P2_Fitness_Class$Class <- factor(Deletion3bp_P2_Fitness_Class$Class, levels = c("B", "N","L"))
plot(as.factor(Deletion3bp_P2_Fitness_Class$viralprotein),as.factor(Deletion3bp_P2_Fitness_Class$Class),ylim=c(0.8,1.0),col = fitness_class_colors_del,lwd = 0.6)
pdf("Deletion_Fitness_Class_3bp.pdf",width = 4, height = 4)
plot(as.factor(Deletion3bp_P2_Fitness_Class$viralprotein),as.factor(Deletion3bp_P2_Fitness_Class$Class),ylim=c(0.8,1.0),col = fitness_class_colors_del,lwd = 0.6)
dev.off()
#Dataframe for DMS for comparison of enrich2 classes
DMS_P2_Fitness_Class <- Fullproteome_P2_DMS_Enrich2_long %>% mutate(viralprotein = sapply(position, function(pos) {
matching_row <- which(pos >= Viral_protein_Mapping$start & pos <= Viral_protein_Mapping$end)
if (length(matching_row) > 0) {return(Viral_protein_Mapping$viralprotein[matching_row])} else {return("No Protein")}})
)
#Adding scores classes depending on the enrich2 scores and plotting the area plot
DMS_P2_Fitness_Class <- na.omit(DMS_P2_Fitness_Class)
DMS_P2_Fitness_Class$Class="N"
DMS_P2_Fitness_Class[DMS_P2_Fitness_Class$score<2,]$Class="N"
DMS_P2_Fitness_Class[DMS_P2_Fitness_Class$score<0,]$Class="L"
DMS_P2_Fitness_Class[DMS_P2_Fitness_Class$score>2,]$Class="B"
DMS_P2_Fitness_Class_Counts <- table(DMS_P2_Fitness_Class$Class,DMS_P2_Fitness_Class$viralprotein)
chisq.test(DMS_P2_Fitness_Class_Counts)
DMS_P2_Fitness_Class$viralprotein <- factor(DMS_P2_Fitness_Class$viralprotein, levels = c("VP4", "VP2", "VP3","VP1","2A","2B","2C","3A","3B","3C","3D"))
DMS_P2_Fitness_Class$Class <- factor(DMS_P2_Fitness_Class$Class, levels = c("B", "N","L"))
plot(as.factor(DMS_P2_Fitness_Class$viralprotein),as.factor(DMS_P2_Fitness_Class$Class),ylim=c(0.5,1.0),col = fitness_class_colors_dms,lwd = 0.6)
pdf("DMS_P2_Fitness_Class.pdf",width = 4, height = 4)
plot(as.factor(DMS_P2_Fitness_Class$viralprotein),as.factor(DMS_P2_Fitness_Class$Class),ylim=c(0.5,1.0),col = fitness_class_colors_dms,lwd = 0.6)
dev.off()
Warning message in chisq.test(Handle_P2_Fitness_Class_counts): “Chi-squared approximation may be incorrect”
Pearson's Chi-squared test data: Handle_P2_Fitness_Class_counts X-squared = 140.43, df = 20, p-value < 2.2e-16
png: 2
Warning message in chisq.test(Deletion3bp_P2_Fitness_Class_Counts): “Chi-squared approximation may be incorrect”
Pearson's Chi-squared test data: Deletion3bp_P2_Fitness_Class_Counts X-squared = 84.685, df = 20, p-value = 6.204e-10
png: 2
Pearson's Chi-squared test data: DMS_P2_Fitness_Class_Counts X-squared = 1793.8, df = 20, p-value < 2.2e-16
png: 2
In [104]:
#Supplementary Figure 6. Enrich2 scores for insertions, deletions, and amino acid changes across the EV-A71 proteome
#Making insertional df compatible for merge
Scores_Insertional_Handle_Fullproteome_formerge <- Scores_Insertional_Handle_Fullproteome %>% mutate(AminoAcid = "Ins")
Scores_Insertional_Handle_Fullproteome_formerge <- Scores_Insertional_Handle_Fullproteome_formerge %>% rename(position = colnames(Scores_Insertional_Handle_Fullproteome_formerge)[1])
#Making deletional library df compatible for merge
Scores_Deletions_Fullproteome_formerge <- Scores_Deletions_Fullproteome %>% rename(AminoAcid = colnames(Scores_Deletions_Fullproteome)[3])
Scores_Deletions_Fullproteome_formerge <- Scores_Deletions_Fullproteome_formerge %>% rename(position = colnames(Scores_Deletions_Fullproteome_formerge)[1])
#Merging all dataframes
merged_df_indel_DMS <- merge(Scores_Insertional_Handle_Fullproteome_formerge, Scores_Deletions_Fullproteome_formerge, by = c("position","AminoAcid","score"),all = TRUE)
merged_df_indel_DMS <- merge(merged_df_indel_DMS, Fullproteome_P2_DMS_Enrich2_long, by = c("position","AminoAcid","score"),all = TRUE)
merged_df_indel_DMS$AminoAcid=factor(merged_df_indel_DMS$AminoAcid,levels = c("3AAdel","2AAdel","1AAdel","Ins","P","G","Y","W","F","V","L","I","A","T","S","Q","N","M","C","E","D","R","K","H"))
write.table(merged_df_indel_DMS,file="merged_df_indel_DMS.csv",row.names=FALSE,sep=",")
# Convert Position to a factor with breaks
breaks <- seq(0, 2193, by = 250)
# Creating a function to assign labels based on position
get_label <- function(position) {
sub_labels <- c("A", "B", "C", "D", "E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V")
sub_index <- findInterval(position, breaks)
return(sub_labels[sub_index])
}
merged_df_indel_DMS$sub <- sapply(merged_df_indel_DMS$position, get_label)
#Plotting the heatmap of the complete proteome
DMS_indel_Heatmap_fullproteome <- ggplot(merged_df_indel_DMS) +
geom_tile(aes(position, AminoAcid, fill = score)) +
scale_fill_gradient2(midpoint = 0,limits = c(-8,8),low = muted("orchid2"), mid = "white",
high = muted("aquamarine1"),space = "Lab",na.value = "black") + theme_classic() +
facet_wrap(sub ~ ., scales = "free_x",ncol=1) + theme(aspect.ratio = 0.1,axis.text.y = element_text(size = 3))+
theme(strip.text.x = element_blank())
#Save figure
ggsave(plot = DMS_indel_Heatmap_fullproteome,filename = "DMS_indel_Heatmap_fullproteome.pdf",width = 8,height=11)
DMS_indel_Heatmap_fullproteome
In [105]:
#Figure 3. Impact of altering insertion sequence, deletion length, and amino acid residue substitution on EV-A71 growth
#Creating a factor for Amino acid changes and plotting area plot for different amino acids
DMS_P2_Fitness_Class$AminoAcid=factor(DMS_P2_Fitness_Class$AminoAcid,levels = c("P","G","Y","W","F","V","L","I","A","T","S","Q","N","M","C","E","D","R","K","H"))
DMS_P2_Fitness_Class_Counts_AA <- table(DMS_P2_Fitness_Class$Class,DMS_P2_Fitness_Class$AminoAcid)
chisq.test(DMS_P2_Fitness_Class_Counts_AA)
plot(as.factor(DMS_P2_Fitness_Class$AminoAcid),as.factor(DMS_P2_Fitness_Class$Class),ylim=c(0.5,1.0),col = fitness_class_colors_dms,lwd = 0.6)
pdf("DMS_P2_Fitness_Class_Aminoacids.pdf",width = 4, height = 4)
plot(as.factor(DMS_P2_Fitness_Class$AminoAcid),as.factor(DMS_P2_Fitness_Class$Class),ylim=c(0.5,1.0),col = fitness_class_colors_dms,lwd = 0.6)
dev.off()
#Area plot for deletions of different lengths
Deletion_P2_Fitness_Class <- Scores_Deletions_Fullproteome
Deletion_P2_Fitness_Class <- na.omit(Deletion_P2_Fitness_Class)
Deletion_P2_Fitness_Class$Class="N"
Deletion_P2_Fitness_Class[Deletion_P2_Fitness_Class$score<2,]$Class="N"
Deletion_P2_Fitness_Class[Deletion_P2_Fitness_Class$score<0,]$Class="L"
Deletion_P2_Fitness_Class[Deletion_P2_Fitness_Class$score>2,]$Class="B"
Deletion_P2_Fitness_Class_Counts <- table(Deletion_P2_Fitness_Class$Class,Deletion_P2_Fitness_Class$dataset)
chisq.test(Deletion_P2_Fitness_Class_Counts)
Deletion_P2_Fitness_Class$Class <- factor(Deletion_P2_Fitness_Class$Class, levels = c("B", "N","L"))
plot(as.factor(Deletion_P2_Fitness_Class$dataset),as.factor(Deletion_P2_Fitness_Class$Class),ylim=c(0.95,1.0),col = fitness_class_colors_del,lwd = 0.6)
pdf("Deletion_Fitness_Class_3_6_9_bp.pdf",width = 4, height = 4)
plot(as.factor(Deletion_P2_Fitness_Class$dataset),as.factor(Deletion_P2_Fitness_Class$Class),ylim=c(0.95,1.0),col = fitness_class_colors_del,lwd = 0.6)
dev.off()
#Creating a factor for the 5 AA insertion library using inserted amino acid as factor
Fullproteome_1AA_Enrich2_long$AminoAcid=factor(Fullproteome_1AA_Enrich2_long$AminoAcid,levels = c("P","G","Y","W","F","V","L","I","A","T","S","Q","N","M","C","E","D","R","K","H"))
Fullproteome_1AA_Enrich2_long$sub <- sapply(Fullproteome_1AA_Enrich2_long$insertion, get_label)
#Heat map function for the 5 AA insertion library
create_AA_heatmap <- function(lower_bound, upper_bound) {
df <- filter(Fullproteome_1AA_Enrich2_long, insertion > lower_bound, insertion < upper_bound)
SingleAA_Heatmap <- ggplot(df) +
geom_tile(aes(insertion, AminoAcid, fill = score)) +
scale_fill_gradient2(midpoint = 0,limits = c(-8,8),
low = muted("orchid2"),
mid = "white",
high = muted("aquamarine1"),
space = "Lab",
na.value = "black") +
theme_classic() +coord_fixed()
filename <- paste0("SingleAA_Heatmap_", lower_bound, "_to_", upper_bound, ".pdf")
ggsave(filename, height = 6, width = 6)
return(SingleAA_Heatmap)
}
#Heatmaps for the sites at N-termini of VP1/2A/3A and VP3
create_AA_heatmap(lower_bound = 399, upper_bound = 451)
create_AA_heatmap(lower_bound = 549, upper_bound = 601)
create_AA_heatmap(lower_bound = 849, upper_bound = 901)
create_AA_heatmap(lower_bound = 1440, upper_bound = 1492)
# Area plot for 5 AA insertion library
SingleAA_fullproteome_Fitness_Class <- Fullproteome_1AA_Enrich2_long
SingleAA_fullproteome_Fitness_Class <- na.omit(SingleAA_fullproteome_Fitness_Class)
SingleAA_fullproteome_Fitness_Class$Class="N"
SingleAA_fullproteome_Fitness_Class[SingleAA_fullproteome_Fitness_Class$score<2,]$Class="N"
SingleAA_fullproteome_Fitness_Class[SingleAA_fullproteome_Fitness_Class$score<0,]$Class="L"
SingleAA_fullproteome_Fitness_Class[SingleAA_fullproteome_Fitness_Class$score>2,]$Class="B"
SingleAA_fullproteome_Fitness_Class_Counts <- table(SingleAA_fullproteome_Fitness_Class$Class,SingleAA_fullproteome_Fitness_Class$AminoAcid)
chisq.test(SingleAA_fullproteome_Fitness_Class_Counts)
SingleAA_fullproteome_Fitness_Class$Class <- factor(SingleAA_fullproteome_Fitness_Class$Class, levels = c("B", "N","L"))
plot(as.factor(SingleAA_fullproteome_Fitness_Class$AminoAcid),as.factor(SingleAA_fullproteome_Fitness_Class$Class),ylim=c(0.97,1),col = fitness_class_colors,lwd = 0.6)
pdf("SingleAA_fullproteome_Fitness_Class.pdf",width = 4, height = 4)
plot(as.factor(SingleAA_fullproteome_Fitness_Class$AminoAcid),as.factor(SingleAA_fullproteome_Fitness_Class$Class),ylim=c(0.97,1),col = fitness_class_colors,lwd = 0.6)
dev.off()
#Creating a dataframe with only deletions and then creating bar plots for different deletion lengths at InDel hotspots
merged_df_del_only <- filter(merged_df_indel_DMS,AminoAcid %in% c("1AAdel", "2AAdel","3AAdel"))
merged_df_del_only <- select(merged_df_del_only,position,AminoAcid,score)
merged_df_del_only$AminoAcid=factor(merged_df_del_only$AminoAcid,levels = c("1AAdel", "2AAdel","3AAdel"))
#Bar plot at VP1 N-terminus cleavage site
Deletion_N_term_VP1 <- filter(merged_df_del_only,position>549,position<601)
vp1_nterminus_del <- ggplot(Deletion_N_term_VP1)+
geom_bar(aes(factor(position),score,fill=AminoAcid),stat = 'identity',position="dodge")+theme_classic()+scale_fill_manual(values =c("#8E8E8E", "#686868", "#444343"))+
theme(axis.text.x= element_text(angle=90))+ ylab("Enrich2 score") + xlab("Amino Acid Position")+facet_grid(AminoAcid~.)+theme(legend.position = "none")+coord_fixed(ylim=c(-4,4))
vp1_nterminus_del
#Save figure
ggsave(plot = vp1_nterminus_del,filename = "vp1_nterminus_del_hotspot2.pdf",width = 7,height=4)
#Bar plot at VP1-2A cleavage site
Deletion_Cleavage_VP1 <- filter(merged_df_del_only,position>849,position<901)
vp1_2a_cleavage <- ggplot(Deletion_Cleavage_VP1)+
geom_bar(aes(factor(position),score,fill=AminoAcid),stat = 'identity',position="dodge")+theme_classic()+scale_fill_manual(values =c("#8E8E8E", "#686868", "#444343"))+
theme(axis.text.x= element_text(angle=90))+ ylab("Relative Fitness") + xlab("Amino Acid Position")+facet_grid(AminoAcid~.)+theme(legend.position = "none")+coord_fixed(ylim=c(-4,4))
vp1_2a_cleavage
#Save figure
ggsave(plot = vp1_2a_cleavage,filename = "vp1_2a_cleavage_hotspot3.pdf",width = 7,height=4)
#Bar plot at 3A N-terminus
Deletion_3A_N_term <- filter(merged_df_del_only,position>1440,position<1492)
del3Adel <- ggplot(Deletion_3A_N_term)+
geom_bar(aes(factor(position),score,fill=AminoAcid),stat = 'identity',position="dodge")+theme_classic()+scale_fill_manual(values =c("#8E8E8E", "#686868", "#444343"))+
theme(axis.text.x= element_text(angle=90))+ ylab("Relative Fitness") + xlab("Amino Acid Position")+facet_grid(AminoAcid~.)+theme(legend.position = "none")+coord_fixed(ylim=c(-4,4))
del3Adel
#Save figure
ggsave(plot = del3Adel,filename = "del3Adel_hotspot4.pdf",width = 7,height=4)
Pearson's Chi-squared test data: DMS_P2_Fitness_Class_Counts_AA X-squared = 991.35, df = 38, p-value < 2.2e-16
png: 2
Pearson's Chi-squared test data: Deletion_P2_Fitness_Class_Counts X-squared = 63.252, df = 4, p-value = 6.007e-13
png: 2
Pearson's Chi-squared test data: SingleAA_fullproteome_Fitness_Class_Counts X-squared = 108.69, df = 38, p-value = 9.873e-09
png: 2
Warning message: “Removed 2 rows containing missing values (`geom_bar()`).”
Warning message: “Removed 2 rows containing missing values (`geom_bar()`).”
In [106]:
#Figure 4. Structural interpretation of InDel fitness effects for 2A(pro) and 3D(pol)
#Supplementary Figure 8. Structural interpretation of mutational effects for 3A, 2C, and 3C(pro)
#Parsing dataframe for exponentiated score as an attribute for chimera for the 2A(pro): Insertional Handle
Insertions_2A <- filter(Scores_Insertional_Handle_Fullproteome,indel>862,indel<1013)
Insertions_2A <- na.omit(Insertions_2A)
Insertions_2A_reindexed <- Insertions_2A %>% mutate(indel=indel-862)
Insertions_2A_reindexed$indel <- paste0(Insertions_2A_reindexed$indel, ".")
Insertions_2A_reindexed$indel <- paste0(Insertions_2A_reindexed$indel, "A")
Insertions_2A_reindexed$indel <- paste(":", Insertions_2A_reindexed$indel, sep = "")
write.csv(Insertions_2A_reindexed,file ="2A_Chimera_Attached_Insertions.csv",)
In [107]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the 2A(pro): Deletion 1 AA
Deletions_2A <- filter(Scores_Deletions1AA_Fullproteome_1AA,indel>862,indel<1013)
Deletions_2A <- na.omit(Deletions_2A)
Deletions_2A_reindexed <- Deletions_2A %>% mutate(indel=indel-862)
Deletions_2A_reindexed$indel <- paste0(Deletions_2A_reindexed$indel, ".")
Deletions_2A_reindexed$indel <- paste0(Deletions_2A_reindexed$indel, "A")
Deletions_2A_reindexed$indel <- paste(":", Deletions_2A_reindexed$indel, sep = "")
write.csv(Deletions_2A_reindexed,file ="2A_Chimera_Attached_Deletions.csv",)
In [108]:
#Parsing dataframe for mean exponentiated score as an attribute for chimera for the 2A(pro): DMS
DMS_2A <- filter(Fullproteome_P2_DMS_Enrich2_long,position>862,position<1013)
DMS_2A <- DMS_2A %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
DMS_2A <- DMS_2A %>% replace(is.na(.), "")
DMS_2A_reindexed <- DMS_2A %>% mutate(position=position-862)
DMS_2A_reindexed$position <- paste0(DMS_2A_reindexed$position, ".")
DMS_2A_reindexed$position <- paste0(DMS_2A_reindexed$position, "A")
DMS_2A_reindexed$position <- paste(":", DMS_2A_reindexed$position, sep = "")
write.csv(DMS_2A_reindexed,file ="DMS_2A_reindexed_chimera.csv",)
DMS_2A_reindexed
| position | score |
|---|---|
| <chr> | <dbl> |
| :1.A | -3.2583277 |
| :2.A | 1.6086559 |
| :3.A | 2.2672482 |
| :4.A | 1.6855832 |
| :5.A | 1.5026366 |
| :6.A | 1.2624536 |
| :7.A | 1.5661049 |
| :8.A | 0.6479698 |
| :9.A | -1.3420372 |
| :10.A | -1.9844672 |
| :11.A | 0.3477806 |
| :12.A | -1.1440086 |
| :13.A | -1.1640856 |
| :14.A | -0.3041383 |
| :15.A | -1.8627696 |
| :16.A | -2.5558196 |
| :17.A | -1.9734187 |
| :18.A | -1.6247600 |
| :19.A | -1.2573557 |
| :20.A | -1.6619071 |
| :21.A | -2.6747915 |
| :22.A | -1.0767590 |
| :23.A | -2.3566803 |
| :24.A | -0.1215330 |
| :25.A | 1.9866321 |
| :26.A | 1.6037350 |
| :27.A | -2.6100113 |
| :28.A | -1.4032755 |
| :29.A | 1.4954758 |
| :30.A | -0.3345598 |
| ⋮ | ⋮ |
| :121.A | -1.8821218 |
| :122.A | -3.3446922 |
| :123.A | -2.2803981 |
| :124.A | -2.1335697 |
| :125.A | -2.7440641 |
| :126.A | -1.6002241 |
| :127.A | -3.0068319 |
| :128.A | -1.3730460 |
| :129.A | 1.2482721 |
| :130.A | -1.0821599 |
| :131.A | 0.6388990 |
| :132.A | -2.0803571 |
| :133.A | -2.7253275 |
| :134.A | -3.0507907 |
| :135.A | -1.8450757 |
| :136.A | -1.9163833 |
| :137.A | -2.1605476 |
| :138.A | -2.8020985 |
| :139.A | -2.5567497 |
| :140.A | -2.1396983 |
| :141.A | 0.9739174 |
| :142.A | -2.6103306 |
| :143.A | -2.4713307 |
| :144.A | -2.3159851 |
| :145.A | -1.6662514 |
| :146.A | -1.0447044 |
| :147.A | -1.9941271 |
| :148.A | -1.2372224 |
| :149.A | -0.7805746 |
| :150.A | -3.2530176 |
In [109]:
#STRIDE of 2A(pro)
STRIDEP71_2A <- readr::read_tsv(file.path("STRIDE_3W95.txt"),col_names = TRUE)
STRIDEP71_2A_DMS <- STRIDEP71_2A %>% rename(position = colnames(STRIDEP71_2A)[1])
#Areaplot enrich2 score class and secondary structures for insertions
NS2A_2ndary_insertion <- filter(Scores_Insertional_Handle_Fullproteome,indel>862,indel<1013)
NS2A_2ndary_insertion <- cbind(NS2A_2ndary_insertion, STRIDEP71_2A$Secondary)
colnames(NS2A_2ndary_insertion)[colnames(NS2A_2ndary_insertion) == "STRIDEP71_2A$Secondary"] <- "Secondary"
NS2A_2ndary_insertion <- NS2A_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2A_2ndary_insertion <- NS2A_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2A_2ndary_insertion_fitness_class <- na.omit(NS2A_2ndary_insertion)
NS2A_2ndary_insertion_fitness_class$Class="N"
NS2A_2ndary_insertion_fitness_class[NS2A_2ndary_insertion_fitness_class$score<2,]$Class="N"
NS2A_2ndary_insertion_fitness_class[NS2A_2ndary_insertion_fitness_class$score<0,]$Class="L"
NS2A_2ndary_insertion_fitness_class[NS2A_2ndary_insertion_fitness_class$score>2,]$Class="B"
NS2A_2ndary_insertion_fitness_class_Counts <- table(NS2A_2ndary_insertion_fitness_class$Class,NS2A_2ndary_insertion_fitness_class$Secondary)
chisq.test(NS2A_2ndary_insertion_fitness_class_Counts)
NS2A_2ndary_insertion_fitness_class$Class <- factor(NS2A_2ndary_insertion_fitness_class$Class, levels = c("B","N","L"))
plot(as.factor(NS2A_2ndary_insertion_fitness_class$Secondary),as.factor(NS2A_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.4,1.0),lwd = 0.5)
pdf("insertion_fitness_class_2ndary_2A.pdf",width = 4, height = 4)
plot(as.factor(NS2A_2ndary_insertion_fitness_class$Secondary),as.factor(NS2A_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
#Areaplot enrich2 score class and secondary structures for deletions
NS2A_2ndary_del <- Scores_Deletions1AA_Fullproteome_1AA
NS2A_2ndary_del <- NS2A_2ndary_del %>% ungroup()
NS2A_2ndary_del <- filter(NS2A_2ndary_del,indel>862,indel<1013)
NS2A_2ndary_del <- cbind(NS2A_2ndary_del, STRIDEP71_2A$Secondary)
colnames(NS2A_2ndary_del)[colnames(NS2A_2ndary_del) == "STRIDEP71_2A$Secondary"] <- "Secondary"
NS2A_2ndary_del <- NS2A_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2A_2ndary_del <- NS2A_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2A_2ndary_del_fitness_class <- na.omit(NS2A_2ndary_del)
NS2A_2ndary_del_fitness_class$Class="N"
NS2A_2ndary_del_fitness_class[NS2A_2ndary_del_fitness_class$score<2,]$Class="N"
NS2A_2ndary_del_fitness_class[NS2A_2ndary_del_fitness_class$score<0,]$Class="L"
NS2A_2ndary_del_fitness_class[NS2A_2ndary_del_fitness_class$score>2,]$Class="B"
NS2A_2ndary_del_fitness_class_Counts <- table(NS2A_2ndary_del_fitness_class$Class,NS2A_2ndary_del_fitness_class$Secondary)
chisq.test(NS2A_2ndary_del_fitness_class_Counts)
NS2A_2ndary_del_fitness_class$Class <- factor(NS2A_2ndary_del_fitness_class$Class, levels = c("B", "N","L"))
plot(as.factor(NS2A_2ndary_del_fitness_class$Secondary),as.factor(NS2A_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.4,1.0),lwd = 0.5)
pdf("deletions_fitness_class_2ndary_2A.pdf",width = 4, height = 4)
plot(as.factor(NS2A_2ndary_del_fitness_class$Secondary),as.factor(NS2A_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
#Areaplot enrich2 score class and secondary structures for DMS
NS2A_2ndary_DMS <- Fullproteome_P2_DMS_Enrich2_long
NS2A_2ndary_DMS <- filter(NS2A_2ndary_DMS,position>862,position<1013)
NS2A_2ndary_DMS <- select(NS2A_2ndary_DMS,position,score,AminoAcid)
NS2A_2ndary_DMS <- mutate(NS2A_2ndary_DMS,position=position-862)
NS2A_2ndary_DMS <- merge(NS2A_2ndary_DMS,STRIDEP71_2A_DMS,by="position")
NS2A_2ndary_DMS <- NS2A_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2A_2ndary_DMS <- NS2A_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2A_2ndary_DMS_fitness_class <- na.omit(NS2A_2ndary_DMS)
NS2A_2ndary_DMS_fitness_class$Class="N"
NS2A_2ndary_DMS_fitness_class[NS2A_2ndary_DMS_fitness_class$score<2,]$Class="N"
NS2A_2ndary_DMS_fitness_class[NS2A_2ndary_DMS_fitness_class$score<0,]$Class="L"
NS2A_2ndary_DMS_fitness_class[NS2A_2ndary_DMS_fitness_class$score>2,]$Class="B"
NS2A_2ndary_DMS_fitness_class_Counts <- table(NS2A_2ndary_DMS_fitness_class$Class,NS2A_2ndary_DMS_fitness_class$Secondary)
chisq.test(NS2A_2ndary_DMS_fitness_class_Counts)
NS2A_2ndary_DMS_fitness_class$Class <- factor(NS2A_2ndary_DMS_fitness_class$Class, levels = c("B", "N", "L"))
plot(as.factor(NS2A_2ndary_DMS_fitness_class$Secondary),as.factor(NS2A_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
pdf("DMS_fitness_class_2ndary_2A.pdf",width = 4, height = 4)
plot(as.factor(NS2A_2ndary_DMS_fitness_class$Secondary),as.factor(NS2A_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
Rows: 150 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "\t" chr (1): Secondary dbl (1): Residue ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Warning message in chisq.test(NS2A_2ndary_insertion_fitness_class_Counts): “Chi-squared approximation may be incorrect”
Pearson's Chi-squared test data: NS2A_2ndary_insertion_fitness_class_Counts X-squared = 11.99, df = 4, p-value = 0.01742
png: 2
Warning message in chisq.test(NS2A_2ndary_del_fitness_class_Counts): “Chi-squared approximation may be incorrect”
Pearson's Chi-squared test data: NS2A_2ndary_del_fitness_class_Counts X-squared = 5.3477, df = 4, p-value = 0.2534
png: 2
Pearson's Chi-squared test data: NS2A_2ndary_DMS_fitness_class_Counts X-squared = 63.698, df = 4, p-value = 4.839e-13
png: 2
In [110]:
#Function to create areaplots for different amino acids and tolerance at secondary structures in 2A
r2A_mutationtype_function <- function(dfi,AA,plotname) {
df <- na.omit(dfi)
df <- filter(df,AminoAcid==AA)
df$Class="N"
df[df$score<2,]$Class="N"
df[df$score<0,]$Class="L"
df[df$score>2,]$Class="B"
df$Class <- factor(df$Class, levels = c("B", "N", "L"))
df_Counts <- table(df$Class,df$Secondary)
df_Counts
plot(as.factor(df$Secondary),as.factor(df$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
pdf(plotname,width = 4, height = 4)
plot(as.factor(df$Secondary),as.factor(df$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
return(df_Counts)
}
#Low alpha helix propensity
r2A_mutationtype_function(NS2A_2ndary_DMS,"G","Glycine_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"P","Proline_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"Y","tyrosine_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"I","IsoLeucine_DMS_2A.pdf")
#High alpha helix propensity
r2A_mutationtype_function(NS2A_2ndary_DMS,"A","Alanine_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"M","met_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"E","Glu_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"L","Leucine_DMS_2A.pdf")
E H L
B 3 1 5
N 16 6 21
L 47 8 23
E H L
B 0 1 5
N 11 4 10
L 63 10 38
E H L
B 6 0 1
N 15 11 17
L 49 4 36
E H L
B 6 0 3
N 27 7 13
L 38 8 39
E H L
B 7 6 7
N 29 3 19
L 35 5 26
E H L
B 7 2 2
N 30 6 26
L 37 6 28
E H L
B 3 3 3
N 17 5 21
L 54 7 28
E H L
B 1 3 1
N 24 4 21
L 42 7 31
In [111]:
#Function to Parse dataframe for exponentiated score as an attribute for chimera for 2C
parse2C_function <- function(dataframeoriginal,dataframeindexed,writedataframe,pos) {
dataframeindexed <- filter(dataframeoriginal,indel>1111,indel<1441)
dataframeindexed <- na.omit(dataframeindexed)
dataframeindexed <- mutate(dataframeindexed,score=2^score)
dataframeindexed <- dataframeindexed %>% mutate(indel=indel-1111)
dataframeindexed$indel <- paste0(dataframeindexed$indel, ".")
dataframeindexed$indel <- paste0(dataframeindexed$indel, "A")
dataframeindexed$indel <- paste(":", dataframeindexed$indel, sep = "")
write.csv(dataframeindexed,file = writedataframe,)
}
#Insertion
parse2C_function(Scores_Insertional_Handle_Fullproteome,Insertions_2C_reindexed,"Insertions_2C_reindexed.csv")
#Deletion
parse2C_function(Scores_Deletions1AA_Fullproteome_1AA,Deletions_2C_reindexed,"Deletions_2C_reindexed.csv")
#DMS
Score_DMS_mean_2C <- Fullproteome_P2_DMS_Enrich2_long %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
Score_DMS_mean_2C <- Score_DMS_mean_2C %>% rename(indel = colnames(Score_DMS_mean_2C)[1])
parse2C_function(Score_DMS_mean_2C,DMS_2C_reindexed,"DMS_2C_reindexed.csv")
In [112]:
#STRIDE of 2C
STRIDEP71_2C <- readr::read_tsv(file.path("STRIDE_5GQ1.txt"),col_names = TRUE)
STRIDEP71_2C <- STRIDEP71_2C %>% complete(Residue = seq(1,329,1))
STRIDEP71_2C_DMS <- STRIDEP71_2C %>% rename(position = colnames(STRIDEP71_2C)[1])
#Areaplot enrich2 score class and secondary structures for insertions
NS2C_2ndary_insertion <- filter(Scores_Insertional_Handle_Fullproteome,indel>1111,indel<1441)
NS2C_2ndary_insertion <- cbind(NS2C_2ndary_insertion, STRIDEP71_2C$Secondary)
colnames(NS2C_2ndary_insertion)[colnames(NS2C_2ndary_insertion) == "STRIDEP71_2C$Secondary"] <- "Secondary"
NS2C_2ndary_insertion <- NS2C_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2C_2ndary_insertion <- NS2C_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2C_2ndary_insertion_fitness_class <- na.omit(NS2C_2ndary_insertion)
NS2C_2ndary_insertion_fitness_class$Class="N"
NS2C_2ndary_insertion_fitness_class[NS2C_2ndary_insertion_fitness_class$score<2,]$Class="N"
NS2C_2ndary_insertion_fitness_class[NS2C_2ndary_insertion_fitness_class$score<0,]$Class="L"
#NS2C_2ndary_insertion_fitness_class[NS2C_2ndary_insertion_fitness_class$score>2,]$Class="B"
NS2C_2ndary_insertion_fitness_class_Counts <- table(NS2C_2ndary_insertion_fitness_class$Class,NS2C_2ndary_insertion_fitness_class$Secondary)
chisq.test(NS2C_2ndary_insertion_fitness_class_Counts)
NS2C_2ndary_insertion_fitness_class$Class <- factor(NS2C_2ndary_insertion_fitness_class$Class, levels = c("N","L"))
plot(as.factor(NS2C_2ndary_insertion_fitness_class$Secondary),as.factor(NS2C_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.8,1.0),lwd = 0.5)
pdf("insertion_fitness_class_2ndary_2C.pdf",width = 4, height = 4)
plot(as.factor(NS2C_2ndary_insertion_fitness_class$Secondary),as.factor(NS2C_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.8,1.0),lwd = 0.5)
dev.off()
#Areaplot enrich2 score class and secondary structures for deletions
NS2C_2ndary_del <- Scores_Deletions1AA_Fullproteome_1AA
NS2C_2ndary_del <- NS2C_2ndary_del %>% ungroup()
NS2C_2ndary_del <- filter(NS2C_2ndary_del,indel>1111,indel<1441)
NS2C_2ndary_del <- cbind(NS2C_2ndary_del, STRIDEP71_2C$Secondary)
colnames(NS2C_2ndary_del)[colnames(NS2C_2ndary_del) == "STRIDEP71_2C$Secondary"] <- "Secondary"
NS2C_2ndary_del <- NS2C_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2C_2ndary_del <- NS2C_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2C_2ndary_del_fitness_class <- na.omit(NS2C_2ndary_del)
NS2C_2ndary_del_fitness_class$Class="N"
NS2C_2ndary_del_fitness_class[NS2C_2ndary_del_fitness_class$score<2,]$Class="N"
NS2C_2ndary_del_fitness_class[NS2C_2ndary_del_fitness_class$score<0,]$Class="L"
NS2C_2ndary_del_fitness_class[NS2C_2ndary_del_fitness_class$score>2,]$Class="B"
NS2C_2ndary_del_fitness_class_Counts <- table(NS2C_2ndary_del_fitness_class$Class,NS2C_2ndary_del_fitness_class$Secondary)
chisq.test(NS2C_2ndary_del_fitness_class_Counts)
NS2C_2ndary_del_fitness_class$Class <- factor(NS2C_2ndary_del_fitness_class$Class, levels = c("B", "N","L"))
plot(as.factor(NS2C_2ndary_del_fitness_class$Secondary),as.factor(NS2C_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.8,1.0),lwd = 0.5)
pdf("deletions_fitness_class_2ndary_2C.pdf",width = 4, height = 4)
plot(as.factor(NS2C_2ndary_del_fitness_class$Secondary),as.factor(NS2C_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.8,1.0),lwd = 0.5)
dev.off()
#Areaplot enrich2 score class and secondary structures for DMS
NS2C_2ndary_DMS <- Fullproteome_P2_DMS_Enrich2_long
NS2C_2ndary_DMS <- filter(NS2C_2ndary_DMS,position>1111,position<1441)
NS2C_2ndary_DMS <- select(NS2C_2ndary_DMS,position,score)
NS2C_2ndary_DMS <- mutate(NS2C_2ndary_DMS,position=position-1111)
NS2C_2ndary_DMS <- merge(NS2C_2ndary_DMS,STRIDEP71_2C_DMS,by="position")
NS2C_2ndary_DMS <- NS2C_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2C_2ndary_DMS <- NS2C_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
write.csv(NS2C_2ndary_DMS,file ="NS2C_2ndary_DMS.csv",)
NS2C_2ndary_DMS_fitness_class <- na.omit(NS2C_2ndary_DMS)
NS2C_2ndary_DMS_fitness_class$Class="N"
NS2C_2ndary_DMS_fitness_class[NS2C_2ndary_DMS_fitness_class$score<2,]$Class="N"
NS2C_2ndary_DMS_fitness_class[NS2C_2ndary_DMS_fitness_class$score<0,]$Class="L"
NS2C_2ndary_DMS_fitness_class[NS2C_2ndary_DMS_fitness_class$score>2,]$Class="B"
NS2C_2ndary_DMS_fitness_class_Counts <- table(NS2C_2ndary_DMS_fitness_class$Class,NS2C_2ndary_DMS_fitness_class$Secondary)
chisq.test(NS2C_2ndary_DMS_fitness_class_Counts)
NS2C_2ndary_DMS_fitness_class$Class <- factor(NS2C_2ndary_DMS_fitness_class$Class, levels = c("B", "N", "L"))
plot(as.factor(NS2C_2ndary_DMS_fitness_class$Secondary),as.factor(NS2C_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.8,1.0),lwd = 0.5)
pdf("DMS_fitness_class_2ndary_2C.pdf",width = 4, height = 4)
plot(as.factor(NS2C_2ndary_DMS_fitness_class$Secondary),as.factor(NS2C_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.8,1.0),lwd = 0.5)
dev.off()
Rows: 207 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "\t" chr (1): Secondary dbl (1): Residue ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Chi-squared test for given probabilities data: NS2C_2ndary_insertion_fitness_class_Counts X-squared = 30.464, df = 2, p-value = 2.426e-07
png: 2
Warning message in chisq.test(NS2C_2ndary_del_fitness_class_Counts): “Chi-squared approximation may be incorrect”
Pearson's Chi-squared test data: NS2C_2ndary_del_fitness_class_Counts X-squared = 0.98555, df = 2, p-value = 0.6109
png: 2
Warning message in chisq.test(NS2C_2ndary_DMS_fitness_class_Counts): “Chi-squared approximation may be incorrect”
Pearson's Chi-squared test data: NS2C_2ndary_DMS_fitness_class_Counts X-squared = 16.477, df = 4, p-value = 0.002442
png: 2
In [113]:
#Function to Parse dataframe for exponentiated score as an attribute for chimera for 3A
parse3A_function <- function(dataframeoriginal,dataframeindexed,writedataframe,pos) {
dataframeindexed <- filter(dataframeoriginal,indel>1440,indel<1527)
dataframeindexed <- na.omit(dataframeindexed)
dataframeindexed <- mutate(dataframeindexed,score=2^score)
dataframeindexed <- dataframeindexed %>% mutate(indel=indel-1440)
dataframeindexed$indel <- paste0(dataframeindexed$indel, ".")
dataframeindexed$indel <- paste0(dataframeindexed$indel, "B")
dataframeindexed$indel <- paste(":", dataframeindexed$indel, sep = "")
write.csv(dataframeindexed,file = writedataframe,)
}
#Insertion
parse3A_function(Scores_Insertional_Handle_Fullproteome,Insertions_3A_reindexed,"Insertions_3A_reindexed.csv")
#Deletion
parse3A_function(Scores_Deletions1AA_Fullproteome_1AA,Deletions_3A_reindexed,"Deletions_3A_reindexed.csv")
#DMS
Score_DMS_mean_3A <- Fullproteome_P2_DMS_Enrich2_long %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
Score_DMS_mean_3A <- Score_DMS_mean_3A %>% rename(indel = colnames(Score_DMS_mean_3A)[1])
parse3A_function(Score_DMS_mean_3A,DMS_3A_reindexed,"DMS_3A_reindexed.csv")
In [114]:
#Function to Parse dataframe for exponentiated score as an attribute for chimera for 3C
parse3C_function <- function(dataframeoriginal,dataframeindexed,writedataframe,pos) {
dataframeindexed <- filter(dataframeoriginal,indel>1548,indel<1732)
dataframeindexed <- na.omit(dataframeindexed)
dataframeindexed <- mutate(dataframeindexed,score=2^score)
dataframeindexed <- dataframeindexed %>% mutate(indel=indel-1548)
dataframeindexed$indel <- paste0(dataframeindexed$indel, ".")
dataframeindexed$indel <- paste0(dataframeindexed$indel, "A")
dataframeindexed$indel <- paste(":", dataframeindexed$indel, sep = "")
write.csv(dataframeindexed,file = writedataframe,)
}
#Insertion
parse3C_function(Scores_Insertional_Handle_Fullproteome,Insertions_3C_reindexed,"Insertions_3C_reindexed.csv")
#Deletion
parse3C_function(Scores_Deletions1AA_Fullproteome_1AA,Deletions_3C_reindexed,"Deletions_3C_reindexed.csv")
#DMS
Score_DMS_mean_3C <- Fullproteome_P2_DMS_Enrich2_long %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
Score_DMS_mean_3C <- Score_DMS_mean_3C %>% rename(indel = colnames(Score_DMS_mean_3C)[1])
parse3C_function(Score_DMS_mean_3C,DMS_3C_reindexed,"DMS_3C_reindexed.csv")
In [115]:
#Function to Parse dataframe for exponentiated score as an attribute for chimera for 3D(pol)
parse3D_function <- function(dataframeoriginal,dataframeindexed,writedataframe,pos) {
dataframeindexed <- filter(dataframeoriginal,indel>1731,indel<2194)
dataframeindexed <- na.omit(dataframeindexed)
dataframeindexed <- dataframeindexed %>% mutate(indel=indel-1731)
dataframeindexed <- mutate(dataframeindexed,score=2^score)
dataframeindexed$indel <- paste0(dataframeindexed$indel, ".")
dataframeindexed$indel <- paste0(dataframeindexed$indel, "A")
dataframeindexed$indel <- paste(":", dataframeindexed$indel, sep = "")
write.csv(dataframeindexed,file = writedataframe,)
}
#Insertion
parse3D_function(Scores_Insertional_Handle_Fullproteome,Insertions_3D_reindexed,"Insertions_3D_reindexed.csv")
#Deletion
parse3D_function(Scores_Deletions1AA_Fullproteome_1AA,Deletions_3D_reindexed,"Deletions_3D_reindexed.csv")
#DMS
Score_DMS_mean_3D <- Fullproteome_P2_DMS_Enrich2_long %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
Score_DMS_mean_3D <- Score_DMS_mean_3D %>% rename(indel = colnames(Score_DMS_mean_3D)[1])
parse3D_function(Score_DMS_mean_3D,DMS_3D_reindexed,"DMS_3D_reindexed.csv")
In [116]:
#Heatmap showing contact sites of 3D with template RNA
#Reading in txt file with contact sites as identified by chimera
overlap_3D <- read_csv("Overlap_Residues_Selected_filtered.txt")
#Function to create a residue list that contain all sites with the contacts
create_DMS_heatmap_residue_list <- function(residues_df) {
residues_list <- residues_df[[1]]
df <- merged_df_indel_DMS
df <- mutate(df,position=position-1731)
df <- filter(df, position %in% residues_list)
df$position <- factor(df$position, levels = residues_list)
#Creating heatmap with the contact sites to show mutational robustness
DMS_Heatmap <- ggplot(df) +
geom_tile(aes(position, AminoAcid, fill = score)) +
scale_fill_gradient2(midpoint = 0,limits = c(-8,8),
low = muted("hotpink"),
mid = "white",
high = muted("green"),
,space = "Lab",
na.value = "black") +
theme_classic()+ scale_x_discrete() +
coord_fixed()+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
filename <- paste0("DMS_Heatmap_3D_overlap", paste(collapse = "_"), ".pdf")
ggsave(filename, height = 4, width = 10)
return(DMS_Heatmap)
}
#Heatmap for 3D with only the contact sites
create_DMS_heatmap_residue_list(overlap_3D)
Rows: 48 Columns: 1 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (1): Residue_Overlap_3D ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
In [117]:
#STRIDE of 3D
STRIDEP71_3D <- readr::read_tsv(file.path("STRIDE_3N6L.txt"),col_names = TRUE)
STRIDEP71_3D_DMS <- STRIDEP71_3D %>% rename(position = colnames(STRIDEP71_3D)[1])
#Areaplot enrich2 score class and secondary structures for insertions
NS3D_2ndary_insertion <- filter(Scores_Insertional_Handle_Fullproteome,indel>1731,indel<2194)
NS3D_2ndary_insertion <- cbind(NS3D_2ndary_insertion, STRIDEP71_3D$Secondary)
colnames(NS3D_2ndary_insertion)[colnames(NS3D_2ndary_insertion) == "STRIDEP71_3D$Secondary"] <- "Secondary"
NS3D_2ndary_insertion <- NS3D_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS3D_2ndary_insertion <- NS3D_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS3D_2ndary_insertion_fitness_class <- na.omit(NS3D_2ndary_insertion)
NS3D_2ndary_insertion_fitness_class$Class="N"
NS3D_2ndary_insertion_fitness_class[NS3D_2ndary_insertion_fitness_class$score<2,]$Class="N"
NS3D_2ndary_insertion_fitness_class[NS3D_2ndary_insertion_fitness_class$score<0,]$Class="L"
#NS3D_2ndary_insertion_fitness_class[NS3D_2ndary_insertion_fitness_class$score>2,]$Class="B"
NS3D_2ndary_insertion_fitness_class_Counts <- table(NS3D_2ndary_insertion_fitness_class$Class,NS3D_2ndary_insertion_fitness_class$Secondary)
chisq.test(NS3D_2ndary_insertion_fitness_class_Counts)
NS3D_2ndary_insertion_fitness_class$Class <- factor(NS3D_2ndary_insertion_fitness_class$Class, levels = c("N","L"))
plot(as.factor(NS3D_2ndary_insertion_fitness_class$Secondary),as.factor(NS3D_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.4,1.0),lwd = 0.5)
pdf("insertion_fitness_class_2ndary_3D.pdf",width = 4, height = 4)
plot(as.factor(NS3D_2ndary_insertion_fitness_class$Secondary),as.factor(NS3D_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
#Areaplot enrich2 score class and secondary structures for deletions
NS3D_2ndary_del <- Scores_Deletions1AA_Fullproteome_1AA
NS3D_2ndary_del <- NS3D_2ndary_del %>% ungroup()
NS3D_2ndary_del <- filter(NS3D_2ndary_del,indel>1731,indel<2194)
NS3D_2ndary_del <- cbind(NS3D_2ndary_del, STRIDEP71_3D$Secondary)
colnames(NS3D_2ndary_del)[colnames(NS3D_2ndary_del) == "STRIDEP71_3D$Secondary"] <- "Secondary"
NS3D_2ndary_del <- NS3D_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS3D_2ndary_del <- NS3D_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
write.csv(NS3D_2ndary_del,file ="NS3D_2ndary_del.csv",)
NS3D_2ndary_del_fitness_class <- na.omit(NS3D_2ndary_del)
NS3D_2ndary_del_fitness_class$Class="N"
NS3D_2ndary_del_fitness_class[NS3D_2ndary_del_fitness_class$score<2,]$Class="N"
NS3D_2ndary_del_fitness_class[NS3D_2ndary_del_fitness_class$score<0,]$Class="L"
#NS3D_2ndary_del_fitness_class[NS3D_2ndary_del_fitness_class$score>2,]$Class="B"
NS3D_2ndary_del_fitness_class_Counts <- table(NS3D_2ndary_del_fitness_class$Class,NS3D_2ndary_del_fitness_class$Secondary)
chisq.test(NS3D_2ndary_del_fitness_class_Counts)
NS3D_2ndary_del_fitness_class$Class <- factor(NS3D_2ndary_del_fitness_class$Class, levels = c("N","L"))
plot(as.factor(NS3D_2ndary_del_fitness_class$Secondary),as.factor(NS3D_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.4,1.0),lwd = 0.5)
pdf("deletions_fitness_class_2ndary_3D.pdf",width = 4, height = 4)
plot(as.factor(NS3D_2ndary_del_fitness_class$Secondary),as.factor(NS3D_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
#Areaplot enrich2 score class and secondary structures for DMS
NS3D_2ndary_DMS <- Fullproteome_P2_DMS_Enrich2_long
NS3D_2ndary_DMS <- filter(NS3D_2ndary_DMS,position>1731,position<2194)
NS3D_2ndary_DMS <- select(NS3D_2ndary_DMS,position,score,AminoAcid)
NS3D_2ndary_DMS <- mutate(NS3D_2ndary_DMS,position=position-1731)
NS3D_2ndary_DMS <- merge(NS3D_2ndary_DMS,STRIDEP71_3D_DMS,by="position")
NS3D_2ndary_DMS <- NS3D_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS3D_2ndary_DMS <- NS3D_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS3D_2ndary_DMS_fitness_class <- na.omit(NS3D_2ndary_DMS)
NS3D_2ndary_DMS_fitness_class$Class="N"
NS3D_2ndary_DMS_fitness_class[NS3D_2ndary_DMS_fitness_class$score<2,]$Class="N"
NS3D_2ndary_DMS_fitness_class[NS3D_2ndary_DMS_fitness_class$score<0,]$Class="L"
NS3D_2ndary_DMS_fitness_class[NS3D_2ndary_DMS_fitness_class$score>2,]$Class="B"
NS3D_2ndary_DMS_fitness_class_Counts <- table(NS3D_2ndary_DMS_fitness_class$Class,NS3D_2ndary_DMS_fitness_class$Secondary)
chisq.test(NS3D_2ndary_DMS_fitness_class_Counts)
NS3D_2ndary_DMS_fitness_class$Class <- factor(NS3D_2ndary_DMS_fitness_class$Class, levels = c("B", "N", "L"))
plot(as.factor(NS3D_2ndary_DMS_fitness_class$Secondary),as.factor(NS3D_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
pdf("DMS_fitness_class_2ndary_3D.pdf",width = 4, height = 4)
plot(as.factor(NS3D_2ndary_DMS_fitness_class$Secondary),as.factor(NS3D_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
Rows: 462 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "\t" chr (1): Secondary dbl (1): Residue ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Chi-squared test for given probabilities data: NS3D_2ndary_insertion_fitness_class_Counts X-squared = 73.753, df = 2, p-value < 2.2e-16
png: 2
Warning message in chisq.test(NS3D_2ndary_del_fitness_class_Counts): “Chi-squared approximation may be incorrect”
Pearson's Chi-squared test data: NS3D_2ndary_del_fitness_class_Counts X-squared = 2.6817, df = 2, p-value = 0.2616
png: 2
Pearson's Chi-squared test data: NS3D_2ndary_DMS_fitness_class_Counts X-squared = 32.706, df = 4, p-value = 1.372e-06
png: 2
In [118]:
#Create areaplots for different amino acids and tolerance at secondary structures in 3D
#Low alphahelix propensity
r2A_mutationtype_function(NS3D_2ndary_DMS,"G","Glycine_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"P","Proline_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"Y","tyrosine_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"I","IsoLeucine_DMS_3D.pdf")
#High alphahelix propensity
r2A_mutationtype_function(NS3D_2ndary_DMS,"A","Alanine_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"M","met_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"E","Glu_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"L","Leucine_DMS_3D.pdf")
E H L
B 0 4 3
N 8 34 29
L 58 156 141
E H L
B 0 2 3
N 1 14 14
L 60 179 164
E H L
B 0 7 3
N 7 18 24
L 53 164 165
E H L
B 3 4 6
N 9 39 35
L 52 139 146
E H L
B 1 9 9
N 12 40 43
L 49 135 136
E H L
B 0 13 8
N 11 53 40
L 53 127 145
E H L
B 0 5 1
N 3 15 23
L 60 160 163
E H L
B 1 4 6
N 7 35 28
L 53 137 144
In [119]:
#Figure 5. Structural interpretation of mutational effects of the Capsid
#Creating a dataframe with relative enrichment focused on VP1
Scores_Insertional_Handle_VP1 <- filter(Scores_Insertional_Handle_Fullproteome,indel>565,indel<863)
Scores_Insertional_Handle_VP1 <- mutate(Scores_Insertional_Handle_VP1,score=2^score)
Scores_Deletions_VP1 <- filter(Scores_Deletions_Fullproteome,indel>565,indel<863)
Scores_Deletions_VP1 <- mutate(Scores_Deletions_VP1,score=2^score)
Scores_Deletions1AA_VP1 <- filter(Scores_Deletions_VP1,dataset=="1AAdel")
Scores_Deletions2AA_VP1 <- filter(Scores_Deletions_VP1,dataset=="2AAdel")
Scores_Deletions3AA_VP1 <- filter(Scores_Deletions_VP1,dataset=="3AAdel")
VP1_P2_DMS_Enrich2_long <- filter(Fullproteome_P2_DMS_Enrich2_long,position>565,position<863)
VP1_P2_DMS_Enrich2_long_exponention <- mutate(VP1_P2_DMS_Enrich2_long,score=2^score)
#Function to create geompoint of different mutation types at VP1. VP1 loops are annotated using geomrect
plot_function_VP1_inserts <- function(data,color,plotname,lim,limm) {
plot <- ggplot(data)+geom_point(aes(indel,score),color=color,size=0.5)+
theme_classic() +coord_cartesian(ylim = c(0, 75)) +xlab("Residue Position") +ylab("Relative Enrichment") +
geom_rect(aes(xmin = 661, xmax = 670, ymin = 1, ymax = 100),alpha = 0.002,fill = "dimgrey",size = 0) +
geom_rect(aes(xmin = 706, xmax = 714, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 722, xmax = 742, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 748, xmax = 752, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 803, xmax = 811, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0)+
geom_text(data = subset(data, score > 10),
aes(indel, score, label = paste(indel)),
vjust = -1,hjust=1,size = 1.5)
ggsave(plot,filename=plotname,width = 4,height=1.5)
return(plot)
}
# Usage of the plotting function
plot_function_VP1_inserts(Scores_Insertional_Handle_VP1,"#1558a9","Scores_Insertional_Handle_VP1.pdf")
plot_function_VP1_inserts(Scores_Deletions2AA_VP1,"#a91515","Scores_Deletions2AA_VP1.pdf")
plot_function_VP1_inserts(Scores_Deletions3AA_VP1,"#a91515","Scores_Deletions3AA_VP1.pdf")
#Geompoint for 1AA deletion at VP1
Scores_Deletions1AA_VP1_plot <- ggplot(Scores_Deletions1AA_VP1)+geom_point(aes(indel,score),color= "#a91515",size=0.5)+
theme_classic() +coord_cartesian(ylim = c(0, 120)) +xlab("Residue Position") +ylab("Relative Enrichment") +
geom_rect(aes(xmin = 661, xmax = 670, ymin = 1, ymax = 145),alpha = 0.002,fill = "dimgrey",size = 0) +
geom_rect(aes(xmin = 706, xmax = 714, ymin = 1, ymax = 145), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 722, xmax = 742, ymin = 1, ymax = 145), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 748, xmax = 752, ymin = 1, ymax = 145), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 803, xmax = 811, ymin = 1, ymax = 145), alpha = 0.002, fill = "dimgrey", size = 0)+
geom_text(data = subset(Scores_Deletions1AA_VP1, score > 10),
aes(indel, score, label = paste(indel)),
vjust = -1,hjust=1,size = 1.5)
Scores_Deletions1AA_VP1_plot
#Save figure
ggsave(Scores_Deletions1AA_VP1_plot,filename="Scores_Deletions1AA_VP1.pdf",width = 4,height=1.5)
#Geompoint for DMS at VP1
DMS_tophit_EV71_VP1 <- ggplot(VP1_P2_DMS_Enrich2_long_exponention)+geom_point(aes(position,score),color="#3c7430",size=0.5)+
theme_classic() +coord_cartesian(ylim = c(0, 100)) +xlab("Residue Position") +ylab("Relative Enrichment") +
geom_rect(aes(xmin = 661, xmax = 670, ymin = 1, ymax = 100),alpha = 0.002,fill = "dimgrey",size = 0) +
geom_rect(aes(xmin = 706, xmax = 714, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 722, xmax = 742, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 748, xmax = 752, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
geom_rect(aes(xmin = 803, xmax = 811, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0)+
geom_text(data = subset(VP1_P2_DMS_Enrich2_long_exponention, score > 50),
aes(position, score, label = paste(position,AminoAcid)),
vjust = -2,hjust=1,size = 1.5)
DMS_tophit_EV71_VP1
#Save Figure
ggsave(DMS_tophit_EV71_VP1,filename="DMS_tophit_EV71_VP1.pdf",width = 4,height=1.5)
Warning message: “Removed 2 rows containing missing values (`geom_point()`).” Warning message: “Removed 2 rows containing missing values (`geom_point()`).”
Warning message: “Removed 1 rows containing missing values (`geom_point()`).” Warning message: “Removed 1 rows containing missing values (`geom_point()`).”
Warning message: “Removed 1 rows containing missing values (`geom_point()`).”
Warning message: “Removed 1 rows containing missing values (`geom_point()`).” Warning message: “Removed 297 rows containing missing values (`geom_point()`).”
Warning message: “Removed 297 rows containing missing values (`geom_point()`).”
In [120]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the capsid proteins: Insertional Handle
Insertions_Strucutral_P2 <- filter(Scores_Insertional_Handle_Fullproteome,indel<863)
Insertions_Strucutral_P2 <- Insertions_Strucutral_P2 %>% replace(is.na(.), "")
Insertions_Strucutral_P2 <- mutate(Insertions_Strucutral_P2,score=2^score)
Insertions_Strucutral_P2_reindexed <- Insertions_Strucutral_P2 %>% mutate(indel = ifelse(row_number() %in% c(70:323), indel - 69, indel))
Insertions_Strucutral_P2_reindexed <- Insertions_Strucutral_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(324:565), indel - 323, indel))
Insertions_Strucutral_P2_reindexed <- Insertions_Strucutral_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(566:862), indel - 565, indel))
Insertions_Strucutral_P2_reindexed$indel <- paste0(Insertions_Strucutral_P2_reindexed$indel, ".")
Insertions_Strucutral_P2_reindexed$indel[1:69] <- paste0(Insertions_Strucutral_P2_reindexed$indel[1:69], "D")
Insertions_Strucutral_P2_reindexed$indel[70:323] <- paste0(Insertions_Strucutral_P2_reindexed$indel[70:323], "B")
Insertions_Strucutral_P2_reindexed$indel[324:565] <- paste0(Insertions_Strucutral_P2_reindexed$indel[324:565], "C")
Insertions_Strucutral_P2_reindexed$indel[566:862] <- paste0(Insertions_Strucutral_P2_reindexed$indel[566:862], "A")
Insertions_Strucutral_P2_reindexed$indel <- paste(":", Insertions_Strucutral_P2_reindexed$indel, sep = "")
write.csv(Insertions_Strucutral_P2_reindexed,file ="Capsid_Chimera_insertions_Attached_p2.csv",)
In [121]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the capsid proteins: 1 AA deletion
Deletions_Strucutral_3bp_P2 <- filter(Scores_Deletions_Fullproteome,dataset=="1AAdel")
Deletions_Strucutral_3bp_P2 <- filter(Deletions_Strucutral_3bp_P2,indel<863)
Deletions_Strucutral_3bp_P2 <- mutate(Deletions_Strucutral_3bp_P2,score=2^score)
Deletions_Strucutral_3bp_P2_reindexed <- Deletions_Strucutral_3bp_P2 %>% mutate(indel = ifelse(row_number() %in% c(70:323), indel - 69, indel))
Deletions_Strucutral_3bp_P2_reindexed <- Deletions_Strucutral_3bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(324:565), indel - 323, indel))
Deletions_Strucutral_3bp_P2_reindexed <- Deletions_Strucutral_3bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(566:862), indel - 565, indel))
Deletions_Strucutral_3bp_P2_reindexed$indel <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel, ".")
Deletions_Strucutral_3bp_P2_reindexed$indel[1:69] <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel[1:69], "D")
Deletions_Strucutral_3bp_P2_reindexed$indel[70:323] <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel[70:323], "B")
Deletions_Strucutral_3bp_P2_reindexed$indel[324:565] <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel[324:565], "C")
Deletions_Strucutral_3bp_P2_reindexed$indel[566:862] <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel[566:862], "A")
Deletions_Strucutral_3bp_P2_reindexed$indel <- paste(":", Deletions_Strucutral_3bp_P2_reindexed$indel, sep = "")
Deletions_Strucutral_3bp_P2_reindexed <- na.omit(Deletions_Strucutral_3bp_P2_reindexed)
write.csv(Deletions_Strucutral_3bp_P2_reindexed,file ="Capsid_Chimera_deletions_Attached_3bp_p2.csv",)
In [122]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the capsid proteins: 2 AA deletion
Deletions_Strucutral_6bp_P2 <- filter(Scores_Deletions_Fullproteome,dataset=="2AAdel")
Deletions_Strucutral_6bp_P2 <- filter(Deletions_Strucutral_6bp_P2,indel<863)
Deletions_Strucutral_6bp_P2 <- mutate(Deletions_Strucutral_6bp_P2,score=2^score)
Deletions_Strucutral_6bp_P2_reindexed <- Deletions_Strucutral_6bp_P2 %>% mutate(indel = ifelse(row_number() %in% c(70:323), indel - 69, indel))
Deletions_Strucutral_6bp_P2_reindexed <- Deletions_Strucutral_6bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(324:565), indel - 323, indel))
Deletions_Strucutral_6bp_P2_reindexed <- Deletions_Strucutral_6bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(566:862), indel - 565, indel))
Deletions_Strucutral_6bp_P2_reindexed$indel <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel, ".")
Deletions_Strucutral_6bp_P2_reindexed$indel[1:69] <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel[1:69], "D")
Deletions_Strucutral_6bp_P2_reindexed$indel[70:323] <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel[70:323], "B")
Deletions_Strucutral_6bp_P2_reindexed$indel[324:565] <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel[324:565], "C")
Deletions_Strucutral_6bp_P2_reindexed$indel[566:862] <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel[566:862], "A")
Deletions_Strucutral_6bp_P2_reindexed$indel <- paste(":", Deletions_Strucutral_6bp_P2_reindexed$indel, sep = "")
Deletions_Strucutral_6bp_P2_reindexed <- na.omit(Deletions_Strucutral_6bp_P2_reindexed)
write.csv(Deletions_Strucutral_6bp_P2_reindexed,file ="Capsid_Chimera_deletions_Attached_6bp_p2.csv",)
In [123]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the capsid proteins: 3 AA deletion
Deletions_Strucutral_9bp_P2 <- filter(Scores_Deletions_Fullproteome,dataset=="3AAdel")
Deletions_Strucutral_9bp_P2 <- filter(Deletions_Strucutral_9bp_P2,indel<863)
Deletions_Strucutral_9bp_P2 <- mutate(Deletions_Strucutral_9bp_P2,score=2^score)
Deletions_Strucutral_9bp_P2_reindexed <- Deletions_Strucutral_9bp_P2 %>% mutate(indel = ifelse(row_number() %in% c(70:323), indel - 69, indel))
Deletions_Strucutral_9bp_P2_reindexed <- Deletions_Strucutral_9bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(324:565), indel - 323, indel))
Deletions_Strucutral_9bp_P2_reindexed <- Deletions_Strucutral_9bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(566:862), indel - 565, indel))
Deletions_Strucutral_9bp_P2_reindexed$indel <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel, ".")
Deletions_Strucutral_9bp_P2_reindexed$indel[1:69] <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel[1:69], "D")
Deletions_Strucutral_9bp_P2_reindexed$indel[70:323] <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel[70:323], "B")
Deletions_Strucutral_9bp_P2_reindexed$indel[324:565] <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel[324:565], "C")
Deletions_Strucutral_9bp_P2_reindexed$indel[566:862] <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel[566:862], "A")
Deletions_Strucutral_9bp_P2_reindexed$indel <- paste(":", Deletions_Strucutral_9bp_P2_reindexed$indel, sep = "")
Deletions_Strucutral_9bp_P2_reindexed <- na.omit(Deletions_Strucutral_9bp_P2_reindexed)
write.csv(Deletions_Strucutral_9bp_P2_reindexed,file ="Capsid_Chimera_deletions_Attached_9bp_p2.csv",)
In [124]:
#Parsing dataframe for mean exponentiated score as an attribute for chimera for the capsid proteins: DMS
DMS_Strucutral_P2 <- filter(Fullproteome_P2_DMS_Enrich2_long,position<863)
DMS_Strucutral_P2 <- mutate(DMS_Strucutral_P2,score=2^score)
DMS_Strucutral_P2 <- DMS_Strucutral_P2 %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
DMS_Strucutral_P2 <- DMS_Strucutral_P2 %>% replace(is.na(.), "")
DMS_Strucutral_P2_reindexed <- DMS_Strucutral_P2 %>% mutate(position = ifelse(row_number() %in% c(70:323), position - 69, position))
DMS_Strucutral_P2_reindexed <- DMS_Strucutral_P2_reindexed %>% mutate(position = ifelse(row_number() %in% c(324:565), position - 323, position))
DMS_Strucutral_P2_reindexed <- DMS_Strucutral_P2_reindexed %>% mutate(position = ifelse(row_number() %in% c(566:862), position - 565, position))
DMS_Strucutral_P2_reindexed$position <- paste0(DMS_Strucutral_P2_reindexed$position, ".")
DMS_Strucutral_P2_reindexed$position[1:69] <- paste0(DMS_Strucutral_P2_reindexed$position[1:69], "D")
DMS_Strucutral_P2_reindexed$position[70:323] <- paste0(DMS_Strucutral_P2_reindexed$position[70:323], "B")
DMS_Strucutral_P2_reindexed$position[324:565] <- paste0(DMS_Strucutral_P2_reindexed$position[324:565], "C")
DMS_Strucutral_P2_reindexed$position[566:862] <- paste0(DMS_Strucutral_P2_reindexed$position[566:862], "A")
DMS_Strucutral_P2_reindexed$position <- paste(":", DMS_Strucutral_P2_reindexed$position, sep = "")
write.csv(DMS_Strucutral_P2_reindexed,file ="Capsid_Chimera_DMS_Attached_p2.csv",)
In [125]:
#Figure 6. InDels as a contributor of Enterovirus A species evolution
#Function for creating a heatmap
create_DMS_heatmap <- function(lower_bound, upper_bound) {
df <- filter(merged_df_indel_DMS, position > lower_bound, position < upper_bound)
DMS_Heatmap <- ggplot(df) +
geom_tile(aes(position, AminoAcid, fill = score)) +
scale_fill_gradient2(midpoint = 0,limits = c(-8,8),
low = muted("hotpink"),
mid = "white",
high = muted("green"),
space = "Lab",
na.value = "black") +
theme_classic() +
coord_fixed()
filename <- paste0("DMS_Heatmap_", lower_bound, "_to_", upper_bound, ".pdf")
ggsave(filename, height = 6, width = 6)
return(DMS_Heatmap)
}
#Create the heatmaps at N-termini of VP1 and 2A
create_DMS_heatmap(lower_bound = 549, upper_bound = 601)
create_DMS_heatmap(lower_bound = 849, upper_bound = 901)
In [126]:
#Supplementary Figure 7. Deep Insertion, Deletion, and Mutational Scanning at InDel hotspot regions
#Parsing the input competition pooled experiment
Scores_Competition_input_plotting <- Scores_Competition_input
Scores_Competition_input_plotting$position <- gsub("p\\.\\((\\d+).*\\)", "\\1", Scores_Competition_input_plotting$hgvs)
Scores_Competition_input_plotting$Variant <- gsub("p\\.\\(\\d+(.*)\\)", "\\1", Scores_Competition_input_plotting$hgvs)
Scores_Competition_input_plotting$position <- as.numeric(Scores_Competition_input_plotting$position)
#Bar plot for input plasmid library at the N-terminus of VP1
Competition_input_plot_Capsid <-ggplot(Scores_Competition_input_plotting)+
stat_summary_bin(show.legend = F,geom = "bar",aes(position,count),binwidth = 1,fun = function(X) {log10(sum(X))} )+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+ coord_cartesian(xlim = c(577, 612))
ggsave("Competition_input_plot_Capsid.pdf",width = 2,height=2)
Competition_input_plot_Capsid
#Bar plot for input plasmid library at the N-terminus of 2A
Competition_input_plot_Replication <-ggplot(Scores_Competition_input_plotting)+
stat_summary_bin(show.legend = F,geom = "bar",aes(position,count),binwidth = 1,fun = function(X) {log10(sum(X))} )+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+ coord_cartesian(xlim = c(863, 910))
Competition_input_plot_Replication
ggsave("Competition_input_plot_Replication.pdf",width = 2,height=2)
#Statistics included
#Count positions with count more than 1
positions_above_1_competition <- sum(Scores_Competition_input_plotting$count > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_competition/1932, "\n")
Number of positions with count more than 1: 1
In [127]:
#Parsing the enrich2 score dataframes for the competition pooled dataframe at P2
Competition_Pool_Enrich2_Scores$position <- gsub("p\\.\\((\\d+).*\\)", "\\1", Competition_Pool_Enrich2_Scores$hgvs)
Competition_Pool_Enrich2_Scores$Variant <- gsub("p\\.\\(\\d+(.*)\\)", "\\1", Competition_Pool_Enrich2_Scores$hgvs)
Competition_Pool_Enrich2_Scores$position <- as.numeric(Competition_Pool_Enrich2_Scores$position)
Competition_Pool_Enrich2_Scores <- select(Competition_Pool_Enrich2_Scores,position,Variant,score)
Competition_Pool_Enrich2_Scores
Warning message in eval(expr, envir, enclos): “NAs introduced by coercion”
| position | Variant | score |
|---|---|---|
| <dbl> | <chr> | <dbl> |
| 577 | _1AAdel | -5.0212407 |
| 577 | _2AAdel | -6.7742591 |
| 577 | _3AAdel | -5.0938696 |
| 577 | A | -4.0681920 |
| 577 | C | -5.4932380 |
| 577 | D | -5.6746217 |
| 577 | E | -5.3712305 |
| 577 | F | -5.1615022 |
| 577 | G | -5.0592367 |
| 577 | H | -5.2259058 |
| 577 | handle | -4.9600395 |
| 577 | K | -1.8548324 |
| 577 | L | -2.1140470 |
| 577 | M | -0.7525873 |
| 577 | N | -5.2285782 |
| 577 | P | -4.3889911 |
| 577 | Q | -4.4021126 |
| 577 | R | -2.3668099 |
| 577 | S | -4.7022723 |
| 577 | T | -1.7286223 |
| 577 | V | 1.0628603 |
| 577 | W | -3.3041341 |
| 577 | Y | -3.2116524 |
| 578 | _1AAdel | -4.2644383 |
| 578 | _2AAdel | -2.1168360 |
| 578 | _3AAdel | -4.2882355 |
| 578 | A | -3.4267051 |
| 578 | C | -6.2138290 |
| 578 | D | -4.6105516 |
| 578 | E | -1.2980551 |
| ⋮ | ⋮ | ⋮ |
| 909 | R | 1.1091144 |
| 909 | S | 0.2636672 |
| 909 | T | 0.8568695 |
| 909 | V | 0.2414721 |
| 909 | W | 1.0353956 |
| 909 | Y | 1.1249580 |
| 910 | _1AAdel | -6.2983892 |
| 910 | _2AAdel | -5.0388709 |
| 910 | _3AAdel | -5.9048265 |
| 910 | A | 0.9576387 |
| 910 | C | 0.3175552 |
| 910 | D | 1.1818012 |
| 910 | E | 1.4313338 |
| 910 | F | 1.3379674 |
| 910 | G | 0.5113716 |
| 910 | H | 0.9969034 |
| 910 | handle | -6.1848293 |
| 910 | I | -4.6552593 |
| 910 | K | -1.4978718 |
| 910 | L | -0.6404986 |
| 910 | M | 0.9430934 |
| 910 | N | 0.6567146 |
| 910 | P | -3.4641825 |
| 910 | R | -1.2790269 |
| 910 | S | -0.4744694 |
| 910 | T | -3.2459876 |
| 910 | V | -4.7139707 |
| 910 | W | 1.4848147 |
| 910 | Y | 1.5962951 |
| NA | _wt | 0.0000000 |
In [128]:
#Creating a dataframe with the exponentiated scores
Competition_Pool_Enrich2_Scores$Variant <- factor(Competition_Pool_Enrich2_Scores$Variant, levels = c(levels = "handle","_1AAdel","_2AAdel","_3AAdel","P","G","Y","W","F","V","L","I","A","T","S","Q","N","M","C","E","D","R","K","H"))
Competition_Pool_Enrich2_Scores_exponen <- mutate(Competition_Pool_Enrich2_Scores,score=2^score)
Competition_Pool_Enrich2_Scores_exponen_VP1 <- filter(Competition_Pool_Enrich2_Scores_exponen,position<620)
Competition_Pool_Enrich2_Scores_exponen_2A <- filter(Competition_Pool_Enrich2_Scores_exponen,position>850)
#Boxplot for different variants for the competition pooled experiment
Boxplot_Competition_Variants <- ggplot(Competition_Pool_Enrich2_Scores, aes(y = score, x = Variant))+ylim(-6,6)+geom_boxplot(outlier.shape = NA,width = 0.5)+theme_classic()+labs(y="Median Scores",x="")+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Boxplot_Competition_Variants
ggsave("Boxplot_Competition_Variants.pdf", height = 2, width = 4)
#Geompoint for variants at the N-terminus of VP1
DMS_tophit_competitionpool_VP1 <- ggplot(Competition_Pool_Enrich2_Scores_exponen_VP1)+geom_point(aes(position,score),color="#7c7b7b",size=0.5)+
theme_classic() +coord_cartesian(ylim = c(0, 20)) +xlab("Residue Position") +ylab("Relative Enrichment") +
geom_text(data = subset(Competition_Pool_Enrich2_Scores_exponen_VP1, score > 10),
aes(position, score, label = paste(position,Variant)),
vjust = 0,hjust=0,size = 1)
#Save figure
ggsave("DMS_tophit_competitionpool_VP1.pdf", height = 2, width = 2)
#Geompoint for variants at the N-terminus of 2A
DMS_tophit_competitionpool_2A <- ggplot(Competition_Pool_Enrich2_Scores_exponen_2A)+geom_point(aes(position,score),color="#7c7b7b",size=0.5)+
theme_classic() +coord_cartesian(ylim = c(0, 20)) +xlab("Residue Position") +ylab("Relative Enrichment") +
geom_text(data = subset(Competition_Pool_Enrich2_Scores_exponen_2A, score > 10),
aes(position, score, label = paste(position,Variant)),
vjust = 0,hjust=0,size = 1)
#Save figure
ggsave("DMS_tophit_competitionpool_2A.pdf", height = 2, width = 2)
DMS_tophit_competitionpool_VP1
DMS_tophit_competitionpool_2A
#DMFE for deletions
Competition_Pool_Enrich2_Scores_Deletions <- filter(Competition_Pool_Enrich2_Scores,Variant %in% c("_1AAdel", "_2AAdel","_3AAdel"))
histo_competition_pool_Deletions <- select(Competition_Pool_Enrich2_Scores_Deletions,score)
histo_competition_pool_histo_Deletions <- ggplot(Competition_Pool_Enrich2_Scores_Deletions, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#a91515", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-8,8))
histo_competition_pool_histo_Deletions
ggsave("histo_competition_pool_histo_Deletions.pdf",width = 2,height=1.5)
#Inset for DMFE for deletions
histo_del_sel_nonlethal_competition_pool <- filter(histo_competition_pool_Deletions,score>0)
del_histogram_nonlethal_competition <- ggplot(histo_del_sel_nonlethal_competition_pool, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#a91515", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
del_histogram_nonlethal_competition
#Save figure
ggsave("del_histogram_nonlethal_competition.pdf",width = 2,height=1.5)
#DMFE for Insertions
Competition_Pool_Enrich2_Scores_Insertions <- filter(Competition_Pool_Enrich2_Scores,Variant=="handle")
histo_competition_pool_Insertions <- select(Competition_Pool_Enrich2_Scores_Insertions,score)
histo_competition_pool_histo_Insertions <- ggplot(histo_competition_pool_Insertions, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#1558a9", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-8,8))
histo_competition_pool_histo_Insertions
ggsave("histo_competition_pool_Insertions.pdf",width = 2,height=1.5)
#Inset for DMFE for Insertions
histo_insertions_sel_nonlethal_competition_pool <- filter(histo_competition_pool_Insertions,score>0)
insertions_histogram_nonlethal_competition <- ggplot(histo_insertions_sel_nonlethal_competition_pool, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#1558a9", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
insertions_histogram_nonlethal_competition
#Save figure
ggsave("insertions_histogram_nonlethal_competition.pdf",width = 2,height=1.5)
#DMFE for DMS
Competition_Pool_Enrich2_Scores_AA <- filter(Competition_Pool_Enrich2_Scores,Variant %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
histo_competition_pool_AA <- select(Competition_Pool_Enrich2_Scores_AA,score)
histo_competition_pool_histo_AA <- ggplot(histo_competition_pool_AA, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#3c7430", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-8,8))
histo_competition_pool_histo_AA
#Save figure
ggsave("histo_competition_pool_histo_AA.pdf",width = 2,height=1.5)
#Inset for DMFE for DMS
histo_AA_sel_nonlethal_competition_pool <- filter(histo_competition_pool_AA,score>0)
AA_histogram_nonlethal_competition <- ggplot(histo_AA_sel_nonlethal_competition_pool, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#3c7430", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
AA_histogram_nonlethal_competition
#Save figure
ggsave("AA_histogram_nonlethal_competition.pdf",width = 2,height=1.5)
Warning message: “Removed 160 rows containing non-finite values (`stat_boxplot()`).” Warning message: “Removed 160 rows containing non-finite values (`stat_boxplot()`).”
In [129]:
#Dataframe for pooled capsid Insertional Handle
Competition_Pool_Enrich2_Scores_Capsid_Insertion <- filter(Competition_Pool_Enrich2_Scores,position<613)
Competition_Pool_Enrich2_Scores_Capsid_Insertion <- filter(Competition_Pool_Enrich2_Scores_Capsid_Insertion,Variant=="handle")
Competition_Pool_Enrich2_Scores_Capsid_Insertion <- Competition_Pool_Enrich2_Scores_Capsid_Insertion %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Capsid_Insertion)[3])
Competition_Pool_Enrich2_Scores_Capsid_Insertion
#Dataframe for capsid Insertional Handle non-normalized experiment
Capsid_insertion_N_terminus_VP1 <- filter(merged_df_indel_DMS,position>576 & position<613)
Capsid_insertion_N_terminus_VP1 <- filter(Capsid_insertion_N_terminus_VP1,AminoAcid =="Ins")
Capsid_insertion_N_terminus_VP1$AminoAcid <- gsub("Ins", "handle", Capsid_insertion_N_terminus_VP1$AminoAcid)
Capsid_insertion_N_terminus_VP1 <- Capsid_insertion_N_terminus_VP1 %>% rename(Variant = colnames(Capsid_insertion_N_terminus_VP1)[2])
Capsid_insertion_N_terminus_VP1 <- Capsid_insertion_N_terminus_VP1 %>% rename(score_nonwt = colnames(Capsid_insertion_N_terminus_VP1)[3])
Capsid_insertion_N_terminus_VP1
Merged_Capsid_insertion_wt_nonwt <- merge(Competition_Pool_Enrich2_Scores_Capsid_Insertion,Capsid_insertion_N_terminus_VP1,all.x=TRUE,all.y=TRUE)
Merged_Capsid_insertion_wt_nonwt
#Scatter plot capsid insertion wildtype vs non wildtype normalized
Scatter_plot_CapsidInsertion_wt_nonwt <- ggplot(Merged_Capsid_insertion_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Enrich2 score", y = "Enrich2 scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_CapsidInsertion_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_CapsidInsertion_wt_nonwt,filename = "Scatter_plot_CapsidInsertion_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Capsid_insertion <- lm(`score_wt`~ `score_nonwt`, data = Merged_Capsid_insertion_wt_nonwt)
summary_model_Capsid_Insertion <- summary(model_Capsid_insertion)
summary_model_Capsid_Insertion
| position | Variant | score_wt |
|---|---|---|
| <dbl> | <fct> | <dbl> |
| 577 | handle | -4.9600395 |
| 578 | handle | -6.4340859 |
| 579 | handle | -5.2972617 |
| 580 | handle | -6.2122032 |
| 581 | handle | -5.5548090 |
| 582 | handle | -6.2967363 |
| 583 | handle | -4.6068378 |
| 584 | handle | -6.1976971 |
| 585 | handle | -0.1871479 |
| 586 | handle | -3.3236263 |
| 587 | handle | -4.0581977 |
| 588 | handle | -4.0090099 |
| 589 | handle | -3.5556873 |
| 590 | handle | -2.9296332 |
| 591 | handle | -3.1259828 |
| 592 | handle | -2.6610715 |
| 593 | handle | -2.9584803 |
| 594 | handle | -4.9355758 |
| 595 | handle | -5.1145645 |
| 596 | handle | -6.0567015 |
| 597 | handle | -6.0202949 |
| 598 | handle | -5.1790274 |
| 599 | handle | -5.1419044 |
| 600 | handle | -6.0524865 |
| 601 | handle | -6.1885227 |
| 602 | handle | -6.5010098 |
| 603 | handle | -4.8790012 |
| 604 | handle | -6.8700823 |
| 605 | handle | -5.5604899 |
| 606 | handle | -6.5090689 |
| 607 | handle | -6.3777329 |
| 608 | handle | -6.6716882 |
| 609 | handle | -6.4625171 |
| 610 | handle | -6.5611980 |
| 611 | handle | -6.2211639 |
| 612 | handle | -6.0795715 |
| position | Variant | score_nonwt | sub |
|---|---|---|---|
| <dbl> | <chr> | <dbl> | <chr> |
| 577 | handle | -2.296280 | C |
| 578 | handle | -3.992367 | C |
| 579 | handle | -3.218787 | C |
| 580 | handle | -3.369305 | C |
| 581 | handle | -3.725543 | C |
| 582 | handle | -3.813319 | C |
| 583 | handle | -2.461757 | C |
| 584 | handle | 2.109562 | C |
| 585 | handle | 6.061249 | C |
| 586 | handle | 1.479317 | C |
| 587 | handle | 1.679606 | C |
| 588 | handle | -2.499626 | C |
| 589 | handle | 1.083437 | C |
| 590 | handle | 3.218903 | C |
| 591 | handle | 1.051457 | C |
| 592 | handle | 1.730435 | C |
| 593 | handle | -3.813319 | C |
| 594 | handle | -3.929545 | C |
| 595 | handle | -3.934520 | C |
| 596 | handle | -3.813319 | C |
| 597 | handle | -3.846380 | C |
| 598 | handle | -3.840945 | C |
| 599 | handle | -3.484659 | C |
| 600 | handle | -3.888826 | C |
| 601 | handle | -3.924545 | C |
| 602 | handle | -4.175859 | C |
| 603 | handle | -3.208583 | C |
| 604 | handle | -4.268863 | C |
| 605 | handle | -3.360533 | C |
| 606 | handle | -4.086186 | C |
| 607 | handle | -4.103100 | C |
| 608 | handle | -3.978216 | C |
| 609 | handle | -3.767466 | C |
| 610 | handle | -4.081912 | C |
| 611 | handle | -3.784908 | C |
| 612 | handle | -3.537702 | C |
| position | Variant | score_wt | score_nonwt | sub |
|---|---|---|---|---|
| <dbl> | <fct> | <dbl> | <dbl> | <chr> |
| 577 | handle | -4.9600395 | -2.296280 | C |
| 578 | handle | -6.4340859 | -3.992367 | C |
| 579 | handle | -5.2972617 | -3.218787 | C |
| 580 | handle | -6.2122032 | -3.369305 | C |
| 581 | handle | -5.5548090 | -3.725543 | C |
| 582 | handle | -6.2967363 | -3.813319 | C |
| 583 | handle | -4.6068378 | -2.461757 | C |
| 584 | handle | -6.1976971 | 2.109562 | C |
| 585 | handle | -0.1871479 | 6.061249 | C |
| 586 | handle | -3.3236263 | 1.479317 | C |
| 587 | handle | -4.0581977 | 1.679606 | C |
| 588 | handle | -4.0090099 | -2.499626 | C |
| 589 | handle | -3.5556873 | 1.083437 | C |
| 590 | handle | -2.9296332 | 3.218903 | C |
| 591 | handle | -3.1259828 | 1.051457 | C |
| 592 | handle | -2.6610715 | 1.730435 | C |
| 593 | handle | -2.9584803 | -3.813319 | C |
| 594 | handle | -4.9355758 | -3.929545 | C |
| 595 | handle | -5.1145645 | -3.934520 | C |
| 596 | handle | -6.0567015 | -3.813319 | C |
| 597 | handle | -6.0202949 | -3.846380 | C |
| 598 | handle | -5.1790274 | -3.840945 | C |
| 599 | handle | -5.1419044 | -3.484659 | C |
| 600 | handle | -6.0524865 | -3.888826 | C |
| 601 | handle | -6.1885227 | -3.924545 | C |
| 602 | handle | -6.5010098 | -4.175859 | C |
| 603 | handle | -4.8790012 | -3.208583 | C |
| 604 | handle | -6.8700823 | -4.268863 | C |
| 605 | handle | -5.5604899 | -3.360533 | C |
| 606 | handle | -6.5090689 | -4.086186 | C |
| 607 | handle | -6.3777329 | -4.103100 | C |
| 608 | handle | -6.6716882 | -3.978216 | C |
| 609 | handle | -6.4625171 | -3.767466 | C |
| 610 | handle | -6.5611980 | -4.081912 | C |
| 611 | handle | -6.2211639 | -3.784908 | C |
| 612 | handle | -6.0795715 | -3.537702 | C |
`geom_smooth()` using formula = 'y ~ x' `geom_smooth()` using formula = 'y ~ x'
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Capsid_insertion_wt_nonwt)
Residuals:
Min 1Q Median 3Q Max
-3.0259 -0.5170 -0.1867 0.5555 2.8669
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.11692 0.20625 -19.961 < 2e-16 ***
score_nonwt 0.44803 0.05886 7.612 7.62e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.925 on 34 degrees of freedom
Multiple R-squared: 0.6302, Adjusted R-squared: 0.6193
F-statistic: 57.94 on 1 and 34 DF, p-value: 7.622e-09
In [130]:
#Dataframe for pooled replication Insertional Handle
Competition_Pool_Enrich2_Scores_Replication_Insertion <- filter(Competition_Pool_Enrich2_Scores,position>862)
Competition_Pool_Enrich2_Scores_Replication_Insertion <- filter(Competition_Pool_Enrich2_Scores_Replication_Insertion,Variant=="handle")
Competition_Pool_Enrich2_Scores_Replication_Insertion <- Competition_Pool_Enrich2_Scores_Replication_Insertion %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Replication_Insertion)[3])
Competition_Pool_Enrich2_Scores_Replication_Insertion
#Dataframe for replication Insertional Handle non-normalized experiment
Replication_insertion_N_terminus_2A <- filter(merged_df_indel_DMS,position>862 & position<911)
Replication_insertion_N_terminus_2A <- filter(Replication_insertion_N_terminus_2A,AminoAcid =="Ins")
Replication_insertion_N_terminus_2A$AminoAcid <- gsub("Ins", "handle", Replication_insertion_N_terminus_2A$AminoAcid)
Replication_insertion_N_terminus_2A <- Replication_insertion_N_terminus_2A %>% rename(Variant = colnames(Replication_insertion_N_terminus_2A)[2])
Replication_insertion_N_terminus_2A <- Replication_insertion_N_terminus_2A %>% rename(score_nonwt = colnames(Replication_insertion_N_terminus_2A)[3])
Replication_insertion_N_terminus_2A
Merged_Replication_DMS_wt_nonwt <- merge(Competition_Pool_Enrich2_Scores_Replication_Insertion,Replication_insertion_N_terminus_2A,all.x=TRUE,all.y=TRUE)
Merged_Replication_DMS_wt_nonwt
#Scatter plot replication insertion wildtype vs non wildtype normalized
Scatter_plot_ReplicationInsertion_wt_nonwt <- ggplot(Merged_Replication_DMS_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Enrich2 score", y = "Enrich2 scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_ReplicationInsertion_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_ReplicationInsertion_wt_nonwt,filename = "Scatter_plot_ReplicationInsertion_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Replication_insertion <- lm(`score_wt`~ `score_nonwt`, data = Merged_Replication_DMS_wt_nonwt)
summary_model_Replication_insertion <- summary(model_Replication_insertion)
summary_model_Replication_insertion
| position | Variant | score_wt |
|---|---|---|
| <dbl> | <fct> | <dbl> |
| 863 | handle | 0.98036629 |
| 864 | handle | 0.88918593 |
| 865 | handle | 0.84169479 |
| 866 | handle | 0.29912089 |
| 867 | handle | 0.16526489 |
| 868 | handle | 0.06288616 |
| 869 | handle | -0.20596919 |
| 870 | handle | -0.72549520 |
| 871 | handle | -1.55513454 |
| 872 | handle | -2.34895633 |
| 873 | handle | -1.95661272 |
| 874 | handle | -2.97841919 |
| 875 | handle | -2.64947742 |
| 876 | handle | -2.95698400 |
| 877 | handle | -3.44412083 |
| 878 | handle | -4.16802016 |
| 879 | handle | -5.29303557 |
| 880 | handle | -5.85205047 |
| 881 | handle | -5.88254839 |
| 882 | handle | -5.98024507 |
| 883 | handle | -6.06716185 |
| 884 | handle | -6.01373034 |
| 885 | handle | -5.70197749 |
| 886 | handle | -5.51815555 |
| 887 | handle | -5.78259666 |
| 888 | handle | -5.63373396 |
| 889 | handle | -6.15858595 |
| 890 | handle | -5.68991098 |
| 891 | handle | -5.53246707 |
| 892 | handle | -5.61100571 |
| 893 | handle | -5.64331870 |
| 894 | handle | -5.66533216 |
| 895 | handle | -5.59444904 |
| 896 | handle | -5.55355748 |
| 897 | handle | -5.75738524 |
| 898 | handle | -5.63373396 |
| 899 | handle | -5.70795658 |
| 900 | handle | -5.59110455 |
| 901 | handle | -5.30174603 |
| 902 | handle | -5.58438184 |
| 903 | handle | -5.54306788 |
| 904 | handle | -5.53246707 |
| 905 | handle | -6.03114101 |
| 906 | handle | -6.31801369 |
| 907 | handle | -6.37620621 |
| 908 | handle | -5.67769708 |
| 909 | handle | -5.68687146 |
| 910 | handle | -6.18482926 |
| position | Variant | score_nonwt | sub |
|---|---|---|---|
| <dbl> | <chr> | <dbl> | <chr> |
| 863 | handle | 5.6617765 | D |
| 864 | handle | 5.6148419 | D |
| 865 | handle | 5.3758824 | D |
| 866 | handle | 5.2051635 | D |
| 867 | handle | 4.9301044 | D |
| 868 | handle | 4.5716542 | D |
| 869 | handle | 4.1886510 | D |
| 870 | handle | 3.5357299 | D |
| 871 | handle | 1.4974590 | D |
| 872 | handle | -5.8030172 | D |
| 873 | handle | 0.8720260 | D |
| 874 | handle | -5.0070473 | D |
| 875 | handle | -4.8366361 | D |
| 876 | handle | -4.8366361 | D |
| 877 | handle | -5.1762261 | D |
| 878 | handle | -4.9694842 | D |
| 879 | handle | -5.2511675 | D |
| 880 | handle | -5.1202114 | D |
| 881 | handle | -5.0070473 | D |
| 882 | handle | -5.2511675 | D |
| 883 | handle | -4.9884421 | D |
| 884 | handle | -4.8688970 | D |
| 885 | handle | -5.0432504 | D |
| 886 | handle | -4.7570487 | D |
| 887 | handle | -4.9304549 | D |
| 888 | handle | -4.7570487 | D |
| 889 | handle | -5.0162217 | D |
| 890 | handle | -4.8256470 | D |
| 891 | handle | -4.6833958 | D |
| 892 | handle | -4.4869624 | D |
| 893 | handle | -5.0695677 | D |
| 894 | handle | -4.8256470 | D |
| 895 | handle | -4.8366361 | D |
| 896 | handle | -4.6310963 | D |
| 897 | handle | -4.8145358 | D |
| 898 | handle | -4.6575879 | D |
| 899 | handle | -5.0695677 | D |
| 900 | handle | -4.8256470 | D |
| 901 | handle | -4.3720821 | D |
| 902 | handle | -4.7688135 | D |
| 903 | handle | -4.7804416 | D |
| 904 | handle | -4.8366361 | D |
| 905 | handle | -5.2725969 | D |
| 906 | handle | -5.5803933 | D |
| 907 | handle | -1.9350930 | D |
| 908 | handle | -0.6678258 | D |
| 909 | handle | -0.5988811 | D |
| 910 | handle | 0.5048040 | D |
| position | Variant | score_wt | score_nonwt | sub |
|---|---|---|---|---|
| <dbl> | <fct> | <dbl> | <dbl> | <chr> |
| 863 | handle | 0.98036629 | 5.6617765 | D |
| 864 | handle | 0.88918593 | 5.6148419 | D |
| 865 | handle | 0.84169479 | 5.3758824 | D |
| 866 | handle | 0.29912089 | 5.2051635 | D |
| 867 | handle | 0.16526489 | 4.9301044 | D |
| 868 | handle | 0.06288616 | 4.5716542 | D |
| 869 | handle | -0.20596919 | 4.1886510 | D |
| 870 | handle | -0.72549520 | 3.5357299 | D |
| 871 | handle | -1.55513454 | 1.4974590 | D |
| 872 | handle | -2.34895633 | -5.8030172 | D |
| 873 | handle | -1.95661272 | 0.8720260 | D |
| 874 | handle | -2.97841919 | -5.0070473 | D |
| 875 | handle | -2.64947742 | -4.8366361 | D |
| 876 | handle | -2.95698400 | -4.8366361 | D |
| 877 | handle | -3.44412083 | -5.1762261 | D |
| 878 | handle | -4.16802016 | -4.9694842 | D |
| 879 | handle | -5.29303557 | -5.2511675 | D |
| 880 | handle | -5.85205047 | -5.1202114 | D |
| 881 | handle | -5.88254839 | -5.0070473 | D |
| 882 | handle | -5.98024507 | -5.2511675 | D |
| 883 | handle | -6.06716185 | -4.9884421 | D |
| 884 | handle | -6.01373034 | -4.8688970 | D |
| 885 | handle | -5.70197749 | -5.0432504 | D |
| 886 | handle | -5.51815555 | -4.7570487 | D |
| 887 | handle | -5.78259666 | -4.9304549 | D |
| 888 | handle | -5.63373396 | -4.7570487 | D |
| 889 | handle | -6.15858595 | -5.0162217 | D |
| 890 | handle | -5.68991098 | -4.8256470 | D |
| 891 | handle | -5.53246707 | -4.6833958 | D |
| 892 | handle | -5.61100571 | -4.4869624 | D |
| 893 | handle | -5.64331870 | -5.0695677 | D |
| 894 | handle | -5.66533216 | -4.8256470 | D |
| 895 | handle | -5.59444904 | -4.8366361 | D |
| 896 | handle | -5.55355748 | -4.6310963 | D |
| 897 | handle | -5.75738524 | -4.8145358 | D |
| 898 | handle | -5.63373396 | -4.6575879 | D |
| 899 | handle | -5.70795658 | -5.0695677 | D |
| 900 | handle | -5.59110455 | -4.8256470 | D |
| 901 | handle | -5.30174603 | -4.3720821 | D |
| 902 | handle | -5.58438184 | -4.7688135 | D |
| 903 | handle | -5.54306788 | -4.7804416 | D |
| 904 | handle | -5.53246707 | -4.8366361 | D |
| 905 | handle | -6.03114101 | -5.2725969 | D |
| 906 | handle | -6.31801369 | -5.5803933 | D |
| 907 | handle | -6.37620621 | -1.9350930 | D |
| 908 | handle | -5.67769708 | -0.6678258 | D |
| 909 | handle | -5.68687146 | -0.5988811 | D |
| 910 | handle | -6.18482926 | 0.5048040 | D |
`geom_smooth()` using formula = 'y ~ x' `geom_smooth()` using formula = 'y ~ x'
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Replication_DMS_wt_nonwt)
Residuals:
Min 1Q Median 3Q Max
-3.6070 -0.3981 -0.2273 0.4363 3.5226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.84137 0.22923 -12.39 2.89e-16 ***
score_nonwt 0.52217 0.04937 10.58 6.63e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.294 on 46 degrees of freedom
Multiple R-squared: 0.7086, Adjusted R-squared: 0.7023
F-statistic: 111.9 on 1 and 46 DF, p-value: 6.629e-14
In [131]:
#Dataframe for pooled capsid Deletion
Competition_Pool_Enrich2_Scores_Capsid_Deletion <- filter(Competition_Pool_Enrich2_Scores,position<613)
Competition_Pool_Enrich2_Scores_Capsid_Deletion <- filter(Competition_Pool_Enrich2_Scores_Capsid_Deletion,Variant %in% c("_1AAdel","_2AAdel","_3AAdel"))
Competition_Pool_Enrich2_Scores_Capsid_Deletion <- Competition_Pool_Enrich2_Scores_Capsid_Deletion %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Capsid_Deletion)[3])
Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant <- gsub("_1AAdel", "1AAdel", Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant <- gsub("_2AAdel", "2AAdel", Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant <- gsub("_3AAdel", "3AAdel", Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Capsid_Deletion
#Dataframe for capsid Deletion non-normalized experiment
Capsid_deletion_N_terminus_2A <- filter(merged_df_indel_DMS,position>576 & position<613)
Capsid_deletion_N_terminus_2A <- filter(Capsid_deletion_N_terminus_2A,AminoAcid %in% c("1AAdel","2AAdel","3AAdel"))
Capsid_deletion_N_terminus_2A <- Capsid_deletion_N_terminus_2A %>% rename(Variant = colnames(Capsid_deletion_N_terminus_2A)[2])
Capsid_deletion_N_terminus_2A <- Capsid_deletion_N_terminus_2A %>% rename(score_nonwt = colnames(Capsid_deletion_N_terminus_2A)[3])
Capsid_deletion_N_terminus_2A
Merged_Capsid_deletion_wt_nonwt <- merge(Competition_Pool_Enrich2_Scores_Capsid_Deletion,Capsid_deletion_N_terminus_2A,all.x=TRUE,all.y=TRUE)
Merged_Capsid_deletion_wt_nonwt
#Scatter plot capsid deletion wildtype vs non wildtype normalized
Scatter_plot_CapsidDeletion_wt_nonwt <- ggplot(Merged_Capsid_deletion_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Enrich2 Score", y = "Enrich2 Scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_CapsidDeletion_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_CapsidDeletion_wt_nonwt,filename = "Scatter_plot_CapsidDeletion_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Capsid_deletion <- lm(`score_wt`~ `score_nonwt`, data = Merged_Capsid_deletion_wt_nonwt)
summary_model_Capsid_deletion <- summary(model_Capsid_deletion)
summary_model_Capsid_deletion
| position | Variant | score_wt |
|---|---|---|
| <dbl> | <chr> | <dbl> |
| 577 | 1AAdel | -5.0212407 |
| 577 | 2AAdel | -6.7742591 |
| 577 | 3AAdel | -5.0938696 |
| 578 | 1AAdel | -4.2644383 |
| 578 | 2AAdel | -2.1168360 |
| 578 | 3AAdel | -4.2882355 |
| 579 | 1AAdel | 3.5386093 |
| 579 | 2AAdel | -0.5381575 |
| 579 | 3AAdel | -4.6268937 |
| 580 | 1AAdel | 1.2185610 |
| 580 | 2AAdel | -2.5369032 |
| 580 | 3AAdel | -6.3762062 |
| 581 | 1AAdel | -1.4574454 |
| 581 | 2AAdel | -2.5426362 |
| 581 | 3AAdel | -6.3608094 |
| 582 | 1AAdel | -2.5510748 |
| 582 | 2AAdel | -5.0276277 |
| 582 | 3AAdel | -6.4667129 |
| 583 | 1AAdel | -4.8748918 |
| 583 | 2AAdel | -6.3356707 |
| 583 | 3AAdel | -3.5367626 |
| 584 | 1AAdel | -2.1131554 |
| 584 | 2AAdel | 0.6757330 |
| 584 | 3AAdel | -1.3671723 |
| 585 | 1AAdel | 0.5374409 |
| 585 | 2AAdel | -0.7120180 |
| 585 | 3AAdel | -4.9806632 |
| 586 | 1AAdel | -1.2041708 |
| 586 | 2AAdel | -2.7770018 |
| 586 | 3AAdel | -3.7595006 |
| ⋮ | ⋮ | ⋮ |
| 603 | 1AAdel | -6.342015 |
| 603 | 2AAdel | -5.213287 |
| 603 | 3AAdel | -6.517064 |
| 604 | 1AAdel | -4.929478 |
| 604 | 2AAdel | -6.337260 |
| 604 | 3AAdel | -6.354584 |
| 605 | 1AAdel | -5.704178 |
| 605 | 2AAdel | -6.596136 |
| 605 | 3AAdel | -4.753775 |
| 606 | 1AAdel | -5.898499 |
| 606 | 2AAdel | -5.961616 |
| 606 | 3AAdel | -6.340433 |
| 607 | 1AAdel | -6.374677 |
| 607 | 2AAdel | -6.271609 |
| 607 | 3AAdel | -6.521037 |
| 608 | 1AAdel | -6.511741 |
| 608 | 2AAdel | -6.303332 |
| 608 | 3AAdel | -4.875257 |
| 609 | 1AAdel | -6.554829 |
| 609 | 2AAdel | -5.213596 |
| 609 | 3AAdel | -6.383817 |
| 610 | 1AAdel | -6.515736 |
| 610 | 2AAdel | -5.999529 |
| 610 | 3AAdel | -6.251043 |
| 611 | 1AAdel | -5.204323 |
| 611 | 2AAdel | -6.510406 |
| 611 | 3AAdel | -6.536775 |
| 612 | 1AAdel | -6.608324 |
| 612 | 2AAdel | -6.626332 |
| 612 | 3AAdel | -6.455485 |
| position | Variant | score_nonwt | sub |
|---|---|---|---|
| <dbl> | <fct> | <dbl> | <chr> |
| 577 | 1AAdel | -2.3578354 | C |
| 577 | 2AAdel | -2.7752988 | C |
| 577 | 3AAdel | -1.2981934 | C |
| 578 | 1AAdel | 2.6041417 | C |
| 578 | 2AAdel | -0.4533854 | C |
| 578 | 3AAdel | -2.4517096 | C |
| 579 | 1AAdel | 6.7995007 | C |
| 579 | 2AAdel | 3.6447267 | C |
| 579 | 3AAdel | -2.3689158 | C |
| 580 | 1AAdel | 4.1870394 | C |
| 580 | 2AAdel | 0.4254279 | C |
| 580 | 3AAdel | -2.1659979 | C |
| 581 | 1AAdel | 1.2360616 | C |
| 581 | 2AAdel | 1.1252104 | C |
| 581 | 3AAdel | -2.3853096 | C |
| 582 | 1AAdel | -0.9166535 | C |
| 582 | 2AAdel | -2.7379113 | C |
| 582 | 3AAdel | -2.3503796 | C |
| 583 | 1AAdel | -2.4120492 | C |
| 583 | 2AAdel | -2.4618022 | C |
| 583 | 3AAdel | -2.4173124 | C |
| 584 | 1AAdel | -0.7457405 | C |
| 584 | 2AAdel | 5.1846288 | C |
| 584 | 3AAdel | 3.0234802 | C |
| 585 | 1AAdel | 3.9092838 | C |
| 585 | 2AAdel | 3.9308834 | C |
| 585 | 3AAdel | -2.4517096 | C |
| 586 | 1AAdel | 1.4321011 | C |
| 586 | 2AAdel | -2.6951032 | C |
| 586 | 3AAdel | -2.7467609 | C |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 603 | 1AAdel | -2.415561 | C |
| 603 | 2AAdel | -2.518728 | C |
| 603 | 3AAdel | -2.619414 | C |
| 604 | 1AAdel | -2.314185 | C |
| 604 | 2AAdel | -2.296572 | C |
| 604 | 3AAdel | -2.488228 | C |
| 605 | 1AAdel | -2.507599 | C |
| 605 | 2AAdel | -2.635017 | C |
| 605 | 3AAdel | -2.401439 | C |
| 606 | 1AAdel | -2.489857 | C |
| 606 | 2AAdel | -2.520308 | C |
| 606 | 3AAdel | -2.415561 | C |
| 607 | 1AAdel | -2.406758 | C |
| 607 | 2AAdel | -2.390715 | C |
| 607 | 3AAdel | -2.625116 | C |
| 608 | 1AAdel | -2.569587 | C |
| 608 | 2AAdel | -2.426023 | C |
| 608 | 3AAdel | -2.488228 | C |
| 609 | 1AAdel | -2.539073 | C |
| 609 | 2AAdel | -2.424287 | C |
| 609 | 3AAdel | -2.266510 | C |
| 610 | 1AAdel | -2.512384 | C |
| 610 | 2AAdel | -2.476753 | C |
| 610 | 3AAdel | -2.250099 | C |
| 611 | 1AAdel | -2.321915 | C |
| 611 | 2AAdel | -2.486597 | C |
| 611 | 3AAdel | -2.453399 | C |
| 612 | 1AAdel | -2.534415 | C |
| 612 | 2AAdel | -2.692449 | C |
| 612 | 3AAdel | -2.607911 | C |
| position | Variant | score_wt | score_nonwt | sub |
|---|---|---|---|---|
| <dbl> | <chr> | <dbl> | <dbl> | <chr> |
| 577 | 1AAdel | -5.0212407 | -2.3578354 | C |
| 577 | 2AAdel | -6.7742591 | -2.7752988 | C |
| 577 | 3AAdel | -5.0938696 | -1.2981934 | C |
| 578 | 1AAdel | -4.2644383 | 2.6041417 | C |
| 578 | 2AAdel | -2.1168360 | -0.4533854 | C |
| 578 | 3AAdel | -4.2882355 | -2.4517096 | C |
| 579 | 1AAdel | 3.5386093 | 6.7995007 | C |
| 579 | 2AAdel | -0.5381575 | 3.6447267 | C |
| 579 | 3AAdel | -4.6268937 | -2.3689158 | C |
| 580 | 1AAdel | 1.2185610 | 4.1870394 | C |
| 580 | 2AAdel | -2.5369032 | 0.4254279 | C |
| 580 | 3AAdel | -6.3762062 | -2.1659979 | C |
| 581 | 1AAdel | -1.4574454 | 1.2360616 | C |
| 581 | 2AAdel | -2.5426362 | 1.1252104 | C |
| 581 | 3AAdel | -6.3608094 | -2.3853096 | C |
| 582 | 1AAdel | -2.5510748 | -0.9166535 | C |
| 582 | 2AAdel | -5.0276277 | -2.7379113 | C |
| 582 | 3AAdel | -6.4667129 | -2.3503796 | C |
| 583 | 1AAdel | -4.8748918 | -2.4120492 | C |
| 583 | 2AAdel | -6.3356707 | -2.4618022 | C |
| 583 | 3AAdel | -3.5367626 | -2.4173124 | C |
| 584 | 1AAdel | -2.1131554 | -0.7457405 | C |
| 584 | 2AAdel | 0.6757330 | 5.1846288 | C |
| 584 | 3AAdel | -1.3671723 | 3.0234802 | C |
| 585 | 1AAdel | 0.5374409 | 3.9092838 | C |
| 585 | 2AAdel | -0.7120180 | 3.9308834 | C |
| 585 | 3AAdel | -4.9806632 | -2.4517096 | C |
| 586 | 1AAdel | -1.2041708 | 1.4321011 | C |
| 586 | 2AAdel | -2.7770018 | -2.6951032 | C |
| 586 | 3AAdel | -3.7595006 | -2.7467609 | C |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 603 | 1AAdel | -6.342015 | -2.415561 | C |
| 603 | 2AAdel | -5.213287 | -2.518728 | C |
| 603 | 3AAdel | -6.517064 | -2.619414 | C |
| 604 | 1AAdel | -4.929478 | -2.314185 | C |
| 604 | 2AAdel | -6.337260 | -2.296572 | C |
| 604 | 3AAdel | -6.354584 | -2.488228 | C |
| 605 | 1AAdel | -5.704178 | -2.507599 | C |
| 605 | 2AAdel | -6.596136 | -2.635017 | C |
| 605 | 3AAdel | -4.753775 | -2.401439 | C |
| 606 | 1AAdel | -5.898499 | -2.489857 | C |
| 606 | 2AAdel | -5.961616 | -2.520308 | C |
| 606 | 3AAdel | -6.340433 | -2.415561 | C |
| 607 | 1AAdel | -6.374677 | -2.406758 | C |
| 607 | 2AAdel | -6.271609 | -2.390715 | C |
| 607 | 3AAdel | -6.521037 | -2.625116 | C |
| 608 | 1AAdel | -6.511741 | -2.569587 | C |
| 608 | 2AAdel | -6.303332 | -2.426023 | C |
| 608 | 3AAdel | -4.875257 | -2.488228 | C |
| 609 | 1AAdel | -6.554829 | -2.539073 | C |
| 609 | 2AAdel | -5.213596 | -2.424287 | C |
| 609 | 3AAdel | -6.383817 | -2.266510 | C |
| 610 | 1AAdel | -6.515736 | -2.512384 | C |
| 610 | 2AAdel | -5.999529 | -2.476753 | C |
| 610 | 3AAdel | -6.251043 | -2.250099 | C |
| 611 | 1AAdel | -5.204323 | -2.321915 | C |
| 611 | 2AAdel | -6.510406 | -2.486597 | C |
| 611 | 3AAdel | -6.536775 | -2.453399 | C |
| 612 | 1AAdel | -6.608324 | -2.534415 | C |
| 612 | 2AAdel | -6.626332 | -2.692449 | C |
| 612 | 3AAdel | -6.455485 | -2.607911 | C |
`geom_smooth()` using formula = 'y ~ x' `geom_smooth()` using formula = 'y ~ x'
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Capsid_deletion_wt_nonwt)
Residuals:
Min 1Q Median 3Q Max
-3.3484 -0.8641 -0.3445 0.6851 3.3740
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.29606 0.14895 -22.13 <2e-16 ***
score_nonwt 0.91394 0.05857 15.61 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.141 on 106 degrees of freedom
Multiple R-squared: 0.6967, Adjusted R-squared: 0.6939
F-statistic: 243.5 on 1 and 106 DF, p-value: < 2.2e-16
In [132]:
#Dataframe for pooled replication Deletion
Competition_Pool_Enrich2_Scores_Replication_Deletion <- filter(Competition_Pool_Enrich2_Scores,position>862)
Competition_Pool_Enrich2_Scores_Replication_Deletion <- filter(Competition_Pool_Enrich2_Scores_Replication_Deletion,Variant %in% c("_1AAdel","_2AAdel","_3AAdel"))
Competition_Pool_Enrich2_Scores_Replication_Deletion <- Competition_Pool_Enrich2_Scores_Replication_Deletion %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Replication_Deletion)[3])
Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant <- gsub("_1AAdel", "1AAdel", Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant <- gsub("_2AAdel", "2AAdel", Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant <- gsub("_3AAdel", "3AAdel", Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Replication_Deletion
#Dataframe for replication Deletion non-normalized experiment
Replication_deletion_N_terminus_2A <- filter(merged_df_indel_DMS,position>862 & position<911)
Replication_deletion_N_terminus_2A <- filter(Replication_deletion_N_terminus_2A,AminoAcid %in% c("1AAdel","2AAdel","3AAdel"))
Replication_deletion_N_terminus_2A <- Replication_deletion_N_terminus_2A %>% rename(Variant = colnames(Replication_deletion_N_terminus_2A)[2])
Replication_deletion_N_terminus_2A <- Replication_deletion_N_terminus_2A %>% rename(score_nonwt = colnames(Replication_deletion_N_terminus_2A)[3])
Replication_deletion_N_terminus_2A
Merged_Replication_deletion_wt_nonwt <- merge(Competition_Pool_Enrich2_Scores_Replication_Deletion,Replication_deletion_N_terminus_2A,all.x=TRUE,all.y=TRUE)
Merged_Replication_deletion_wt_nonwt
#Scatter plot replication deletion wildtype vs non wildtype normalized
Scatter_plot_ReplicationDeletion_wt_nonwt <- ggplot(Merged_Replication_deletion_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Enrich2 Score", y = "Enrich2 Scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_ReplicationDeletion_wt_nonwt
#Save figute
ggsave(plot = Scatter_plot_ReplicationDeletion_wt_nonwt,filename = "Scatter_plot_ReplicationDeletion_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Replication_deletion <- lm(`score_wt`~ `score_nonwt`, data = Merged_Replication_deletion_wt_nonwt)
summary_model_Replication_deletion <- summary(model_Replication_deletion)
summary_model_Replication_deletion
| position | Variant | score_wt |
|---|---|---|
| <dbl> | <chr> | <dbl> |
| 863 | 1AAdel | -6.37467716 |
| 863 | 2AAdel | -4.75496860 |
| 863 | 3AAdel | -2.68158048 |
| 864 | 1AAdel | 1.10097240 |
| 864 | 2AAdel | -3.36379488 |
| 864 | 3AAdel | -2.64838944 |
| 865 | 1AAdel | 1.83509394 |
| 865 | 2AAdel | -1.72911126 |
| 865 | 3AAdel | -4.07427109 |
| 866 | 1AAdel | 0.43105445 |
| 866 | 2AAdel | -3.53712924 |
| 866 | 3AAdel | -3.88133406 |
| 867 | 1AAdel | 0.67720653 |
| 867 | 2AAdel | -4.84949044 |
| 867 | 3AAdel | -4.47663626 |
| 868 | 1AAdel | 0.47879095 |
| 868 | 2AAdel | -1.73860582 |
| 868 | 3AAdel | -6.50638976 |
| 869 | 1AAdel | 0.04180394 |
| 869 | 2AAdel | -3.90495543 |
| 869 | 3AAdel | -6.56373446 |
| 870 | 1AAdel | -0.87970812 |
| 870 | 2AAdel | -6.21758924 |
| 870 | 3AAdel | -6.23533594 |
| 871 | 1AAdel | -0.35578323 |
| 871 | 2AAdel | -5.83118922 |
| 871 | 3AAdel | -6.44698002 |
| 872 | 1AAdel | -4.74296224 |
| 872 | 2AAdel | -6.26821087 |
| 872 | 3AAdel | -6.60953454 |
| ⋮ | ⋮ | ⋮ |
| 901 | 1AAdel | -6.429751 |
| 901 | 2AAdel | -6.256224 |
| 901 | 3AAdel | -6.527625 |
| 902 | 1AAdel | -6.581312 |
| 902 | 2AAdel | -6.071316 |
| 902 | 3AAdel | -6.544552 |
| 903 | 1AAdel | -6.278372 |
| 903 | 2AAdel | -6.274996 |
| 903 | 3AAdel | -5.928528 |
| 904 | 1AAdel | -5.305298 |
| 904 | 2AAdel | -5.966531 |
| 904 | 3AAdel | -6.422483 |
| 905 | 1AAdel | -5.090539 |
| 905 | 2AAdel | -5.655986 |
| 905 | 3AAdel | -5.259759 |
| 906 | 1AAdel | -3.084304 |
| 906 | 2AAdel | -5.869954 |
| 906 | 3AAdel | -6.426850 |
| 907 | 1AAdel | -4.695391 |
| 907 | 2AAdel | -5.740217 |
| 907 | 3AAdel | -6.194037 |
| 908 | 1AAdel | -5.877530 |
| 908 | 2AAdel | -5.420188 |
| 908 | 3AAdel | -6.397371 |
| 909 | 1AAdel | -4.174291 |
| 909 | 2AAdel | -5.809883 |
| 909 | 3AAdel | -5.002473 |
| 910 | 1AAdel | -6.298389 |
| 910 | 2AAdel | -5.038871 |
| 910 | 3AAdel | -5.904827 |
| position | Variant | score_nonwt | sub |
|---|---|---|---|
| <dbl> | <fct> | <dbl> | <chr> |
| 863 | 1AAdel | -2.66641372 | D |
| 863 | 2AAdel | -4.88781015 | D |
| 863 | 3AAdel | 1.18925684 | D |
| 864 | 1AAdel | 7.19446630 | D |
| 864 | 2AAdel | 0.09504572 | D |
| 864 | 3AAdel | 1.18637549 | D |
| 865 | 1AAdel | 7.25664801 | D |
| 865 | 2AAdel | 2.59107135 | D |
| 865 | 3AAdel | 1.75828100 | D |
| 866 | 1AAdel | 6.18943321 | D |
| 866 | 2AAdel | 1.71522700 | D |
| 866 | 3AAdel | 0.19412594 | D |
| 867 | 1AAdel | 6.18370269 | D |
| 867 | 2AAdel | 2.31402344 | D |
| 867 | 3AAdel | 0.07227784 | D |
| 868 | 1AAdel | 6.40225455 | D |
| 868 | 2AAdel | 2.44891091 | D |
| 868 | 3AAdel | -4.65007706 | D |
| 869 | 1AAdel | 5.95561786 | D |
| 869 | 2AAdel | -0.34340309 | D |
| 869 | 3AAdel | -5.01976600 | D |
| 870 | 1AAdel | 5.21598662 | D |
| 870 | 2AAdel | -4.71201636 | D |
| 870 | 3AAdel | -4.69383404 | D |
| 871 | 1AAdel | 4.95562685 | D |
| 871 | 2AAdel | -4.13498249 | D |
| 871 | 3AAdel | -4.85711721 | D |
| 872 | 1AAdel | 2.35439207 | D |
| 872 | 2AAdel | -4.71201636 | D |
| 872 | 3AAdel | -4.91758903 | D |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 901 | 1AAdel | -4.753199 | D |
| 901 | 2AAdel | -4.371934 | D |
| 901 | 3AAdel | -4.650077 | D |
| 902 | 1AAdel | -4.610984 | D |
| 902 | 2AAdel | -4.363459 | D |
| 902 | 3AAdel | -4.955963 | D |
| 903 | 1AAdel | -4.405132 | D |
| 903 | 2AAdel | -4.236593 | D |
| 903 | 3AAdel | -4.922467 | D |
| 904 | 1AAdel | -4.113360 | D |
| 904 | 2AAdel | -4.166564 | D |
| 904 | 3AAdel | -4.363459 | D |
| 905 | 1AAdel | -4.549321 | D |
| 905 | 2AAdel | -4.437263 | D |
| 905 | 3AAdel | -4.236593 | D |
| 906 | 1AAdel | -4.468394 | D |
| 906 | 2AAdel | -3.779071 | D |
| 906 | 3AAdel | -4.687699 | D |
| 907 | 1AAdel | -3.934312 | D |
| 907 | 2AAdel | -4.302037 | D |
| 907 | 3AAdel | -4.371934 | D |
| 908 | 1AAdel | -3.907820 | D |
| 908 | 2AAdel | -4.363459 | D |
| 908 | 3AAdel | -4.556363 | D |
| 909 | 1AAdel | 1.005060 | D |
| 909 | 2AAdel | -4.009819 | D |
| 909 | 3AAdel | -4.535086 | D |
| 910 | 1AAdel | -4.197179 | D |
| 910 | 2AAdel | -4.009819 | D |
| 910 | 3AAdel | -4.460702 | D |
| position | Variant | score_wt | score_nonwt | sub |
|---|---|---|---|---|
| <dbl> | <chr> | <dbl> | <dbl> | <chr> |
| 863 | 1AAdel | -6.37467716 | -2.66641372 | D |
| 863 | 2AAdel | -4.75496860 | -4.88781015 | D |
| 863 | 3AAdel | -2.68158048 | 1.18925684 | D |
| 864 | 1AAdel | 1.10097240 | 7.19446630 | D |
| 864 | 2AAdel | -3.36379488 | 0.09504572 | D |
| 864 | 3AAdel | -2.64838944 | 1.18637549 | D |
| 865 | 1AAdel | 1.83509394 | 7.25664801 | D |
| 865 | 2AAdel | -1.72911126 | 2.59107135 | D |
| 865 | 3AAdel | -4.07427109 | 1.75828100 | D |
| 866 | 1AAdel | 0.43105445 | 6.18943321 | D |
| 866 | 2AAdel | -3.53712924 | 1.71522700 | D |
| 866 | 3AAdel | -3.88133406 | 0.19412594 | D |
| 867 | 1AAdel | 0.67720653 | 6.18370269 | D |
| 867 | 2AAdel | -4.84949044 | 2.31402344 | D |
| 867 | 3AAdel | -4.47663626 | 0.07227784 | D |
| 868 | 1AAdel | 0.47879095 | 6.40225455 | D |
| 868 | 2AAdel | -1.73860582 | 2.44891091 | D |
| 868 | 3AAdel | -6.50638976 | -4.65007706 | D |
| 869 | 1AAdel | 0.04180394 | 5.95561786 | D |
| 869 | 2AAdel | -3.90495543 | -0.34340309 | D |
| 869 | 3AAdel | -6.56373446 | -5.01976600 | D |
| 870 | 1AAdel | -0.87970812 | 5.21598662 | D |
| 870 | 2AAdel | -6.21758924 | -4.71201636 | D |
| 870 | 3AAdel | -6.23533594 | -4.69383404 | D |
| 871 | 1AAdel | -0.35578323 | 4.95562685 | D |
| 871 | 2AAdel | -5.83118922 | -4.13498249 | D |
| 871 | 3AAdel | -6.44698002 | -4.85711721 | D |
| 872 | 1AAdel | -4.74296224 | 2.35439207 | D |
| 872 | 2AAdel | -6.26821087 | -4.71201636 | D |
| 872 | 3AAdel | -6.60953454 | -4.91758903 | D |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 901 | 1AAdel | -6.429751 | -4.753199 | D |
| 901 | 2AAdel | -6.256224 | -4.371934 | D |
| 901 | 3AAdel | -6.527625 | -4.650077 | D |
| 902 | 1AAdel | -6.581312 | -4.610984 | D |
| 902 | 2AAdel | -6.071316 | -4.363459 | D |
| 902 | 3AAdel | -6.544552 | -4.955963 | D |
| 903 | 1AAdel | -6.278372 | -4.405132 | D |
| 903 | 2AAdel | -6.274996 | -4.236593 | D |
| 903 | 3AAdel | -5.928528 | -4.922467 | D |
| 904 | 1AAdel | -5.305298 | -4.113360 | D |
| 904 | 2AAdel | -5.966531 | -4.166564 | D |
| 904 | 3AAdel | -6.422483 | -4.363459 | D |
| 905 | 1AAdel | -5.090539 | -4.549321 | D |
| 905 | 2AAdel | -5.655986 | -4.437263 | D |
| 905 | 3AAdel | -5.259759 | -4.236593 | D |
| 906 | 1AAdel | -3.084304 | -4.468394 | D |
| 906 | 2AAdel | -5.869954 | -3.779071 | D |
| 906 | 3AAdel | -6.426850 | -4.687699 | D |
| 907 | 1AAdel | -4.695391 | -3.934312 | D |
| 907 | 2AAdel | -5.740217 | -4.302037 | D |
| 907 | 3AAdel | -6.194037 | -4.371934 | D |
| 908 | 1AAdel | -5.877530 | -3.907820 | D |
| 908 | 2AAdel | -5.420188 | -4.363459 | D |
| 908 | 3AAdel | -6.397371 | -4.556363 | D |
| 909 | 1AAdel | -4.174291 | 1.005060 | D |
| 909 | 2AAdel | -5.809883 | -4.009819 | D |
| 909 | 3AAdel | -5.002473 | -4.535086 | D |
| 910 | 1AAdel | -6.298389 | -4.197179 | D |
| 910 | 2AAdel | -5.038871 | -4.009819 | D |
| 910 | 3AAdel | -5.904827 | -4.460702 | D |
`geom_smooth()` using formula = 'y ~ x' `geom_smooth()` using formula = 'y ~ x'
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Replication_deletion_wt_nonwt)
Residuals:
Min 1Q Median 3Q Max
-2.6654 -0.5129 -0.2778 0.3367 3.6205
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.42809 0.12554 -27.31 <2e-16 ***
score_nonwt 0.53759 0.02835 18.97 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.998 on 142 degrees of freedom
Multiple R-squared: 0.717, Adjusted R-squared: 0.715
F-statistic: 359.7 on 1 and 142 DF, p-value: < 2.2e-16
In [133]:
#Dataframe for pooled capsid DMS
Competition_Pool_Enrich2_Scores_Capsid_DMS <- filter(Competition_Pool_Enrich2_Scores,position<613)
Competition_Pool_Enrich2_Scores_Capsid_DMS <- filter(Competition_Pool_Enrich2_Scores_Capsid_DMS,Variant %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
Competition_Pool_Enrich2_Scores_Capsid_DMS <- Competition_Pool_Enrich2_Scores_Capsid_DMS %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Capsid_DMS)[3])
Competition_Pool_Enrich2_Scores_Capsid_DMS
#Dataframe for capsid DMS non-normalized experiment
Capsid_DMS_N_terminus_VP1 <- filter(merged_df_indel_DMS,position>576 & position<613)
Capsid_DMS_N_terminus_VP1 <- filter(Capsid_DMS_N_terminus_VP1,AminoAcid %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
Capsid_DMS_N_terminus_VP1 <- Capsid_DMS_N_terminus_VP1 %>% rename(Variant = colnames(Capsid_DMS_N_terminus_VP1)[2])
Capsid_DMS_N_terminus_VP1 <- Capsid_DMS_N_terminus_VP1 %>% rename(score_nonwt = colnames(Capsid_DMS_N_terminus_VP1)[3])
Capsid_DMS_N_terminus_VP1
Merged_Capsid_DMS_wt_nonwt <- merge(Competition_Pool_Enrich2_Scores_Capsid_DMS,Capsid_DMS_N_terminus_VP1,all.x=TRUE,all.y=TRUE)
Merged_Capsid_DMS_wt_nonwt
#Scatter plot capsid DMS wildtype vs non wildtype normalized
Scatter_plot_CapsidDMS_wt_nonwt <- ggplot(Merged_Capsid_DMS_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Enrich2 score", y = "Enrich2 score WT normalized") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_CapsidDMS_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_CapsidDMS_wt_nonwt,filename = "Scatter_plot_CapsidDMS_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Capsid_DMS <- lm(`score_wt`~ `score_nonwt`, data = Merged_Capsid_DMS_wt_nonwt)
summary_model_Capsid_DMS <- summary(model_Capsid_DMS)
summary_model_Capsid_DMS
| position | Variant | score_wt |
|---|---|---|
| <dbl> | <fct> | <dbl> |
| 577 | A | -4.0681920 |
| 577 | C | -5.4932380 |
| 577 | D | -5.6746217 |
| 577 | E | -5.3712305 |
| 577 | F | -5.1615022 |
| 577 | G | -5.0592367 |
| 577 | H | -5.2259058 |
| 577 | K | -1.8548324 |
| 577 | L | -2.1140470 |
| 577 | M | -0.7525873 |
| 577 | N | -5.2285782 |
| 577 | P | -4.3889911 |
| 577 | Q | -4.4021126 |
| 577 | R | -2.3668099 |
| 577 | S | -4.7022723 |
| 577 | T | -1.7286223 |
| 577 | V | 1.0628603 |
| 577 | W | -3.3041341 |
| 577 | Y | -3.2116524 |
| 578 | A | -3.4267051 |
| 578 | C | -6.2138290 |
| 578 | D | -4.6105516 |
| 578 | E | -1.2980551 |
| 578 | F | -5.8688034 |
| 578 | H | -5.2044387 |
| 578 | I | -5.1851741 |
| 578 | K | -5.5388190 |
| 578 | L | -4.2710684 |
| 578 | M | -5.9569094 |
| 578 | N | -5.0356990 |
| ⋮ | ⋮ | ⋮ |
| 611 | L | -4.620711 |
| 611 | M | -5.519603 |
| 611 | N | -5.876337 |
| 611 | P | -3.313134 |
| 611 | Q | -4.451732 |
| 611 | R | -3.899632 |
| 611 | S | -2.817052 |
| 611 | T | -1.878903 |
| 611 | V | -1.079373 |
| 611 | W | -5.388449 |
| 611 | Y | -5.721858 |
| 612 | A | -3.521517 |
| 612 | C | -5.183790 |
| 612 | D | -5.909934 |
| 612 | E | -5.746796 |
| 612 | F | -1.967298 |
| 612 | G | -3.988101 |
| 612 | H | -2.179719 |
| 612 | I | -3.771855 |
| 612 | K | -5.571930 |
| 612 | M | -5.527930 |
| 612 | N | -4.336194 |
| 612 | P | -1.406304 |
| 612 | Q | -5.722649 |
| 612 | R | -3.061978 |
| 612 | S | -2.072730 |
| 612 | T | -3.236892 |
| 612 | V | -4.351154 |
| 612 | W | -5.780988 |
| 612 | Y | -5.449361 |
| position | Variant | score_nonwt | sub |
|---|---|---|---|
| <dbl> | <fct> | <dbl> | <chr> |
| 577 | A | -5.1246825 | C |
| 577 | C | -5.2262348 | C |
| 577 | D | -4.7012331 | C |
| 577 | E | -5.2849820 | C |
| 577 | F | -5.6736885 | C |
| 577 | G | -6.4046472 | C |
| 577 | H | -5.4793909 | C |
| 577 | I | NA | C |
| 577 | K | -4.1152428 | C |
| 577 | L | -4.7887246 | C |
| 577 | M | -0.7580631 | C |
| 577 | N | -4.7874430 | C |
| 577 | P | -5.0204239 | C |
| 577 | Q | -5.2350027 | C |
| 577 | R | -5.5286590 | C |
| 577 | S | -6.5757933 | C |
| 577 | T | -4.7898422 | C |
| 577 | V | -0.5469281 | C |
| 577 | W | -5.0126960 | C |
| 577 | Y | -4.8839610 | C |
| 578 | A | -4.6214137 | C |
| 578 | C | -5.4494128 | C |
| 578 | D | -5.2709751 | C |
| 578 | E | -5.1585080 | C |
| 578 | F | -5.6481755 | C |
| 578 | G | NA | C |
| 578 | H | -5.3404873 | C |
| 578 | I | -5.2851494 | C |
| 578 | K | -4.4227698 | C |
| 578 | L | -4.4423021 | C |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 611 | M | -5.0338953 | C |
| 611 | N | -5.6659750 | C |
| 611 | P | -4.8742732 | C |
| 611 | Q | -3.8101310 | C |
| 611 | R | -4.6766472 | C |
| 611 | S | -4.8079484 | C |
| 611 | T | -0.9246352 | C |
| 611 | V | -4.5783403 | C |
| 611 | W | -4.6243165 | C |
| 611 | Y | -5.3610731 | C |
| 612 | A | -5.7091347 | C |
| 612 | C | -5.0978707 | C |
| 612 | D | -5.4483826 | C |
| 612 | E | -5.6151519 | C |
| 612 | F | -5.4403390 | C |
| 612 | G | -4.9803114 | C |
| 612 | H | -5.0385042 | C |
| 612 | I | -3.1776197 | C |
| 612 | K | -5.4654803 | C |
| 612 | L | NA | C |
| 612 | M | -5.0200747 | C |
| 612 | N | -4.9585691 | C |
| 612 | P | -5.5663309 | C |
| 612 | Q | -5.5409772 | C |
| 612 | R | -4.7718689 | C |
| 612 | S | -1.8884444 | C |
| 612 | T | -5.4643457 | C |
| 612 | V | -2.1406578 | C |
| 612 | W | -5.1927246 | C |
| 612 | Y | -4.9886971 | C |
| position | Variant | score_wt | score_nonwt | sub |
|---|---|---|---|---|
| <dbl> | <fct> | <dbl> | <dbl> | <chr> |
| 577 | P | -4.3889911 | -5.0204239 | C |
| 577 | G | -5.0592367 | -6.4046472 | C |
| 577 | Y | -3.2116524 | -4.8839610 | C |
| 577 | W | -3.3041341 | -5.0126960 | C |
| 577 | F | -5.1615022 | -5.6736885 | C |
| 577 | V | 1.0628603 | -0.5469281 | C |
| 577 | L | -2.1140470 | -4.7887246 | C |
| 577 | I | NA | NA | C |
| 577 | A | -4.0681920 | -5.1246825 | C |
| 577 | T | -1.7286223 | -4.7898422 | C |
| 577 | S | -4.7022723 | -6.5757933 | C |
| 577 | Q | -4.4021126 | -5.2350027 | C |
| 577 | N | -5.2285782 | -4.7874430 | C |
| 577 | M | -0.7525873 | -0.7580631 | C |
| 577 | C | -5.4932380 | -5.2262348 | C |
| 577 | E | -5.3712305 | -5.2849820 | C |
| 577 | D | -5.6746217 | -4.7012331 | C |
| 577 | R | -2.3668099 | -5.5286590 | C |
| 577 | K | -1.8548324 | -4.1152428 | C |
| 577 | H | -5.2259058 | -5.4793909 | C |
| 578 | P | -4.6974161 | -5.0899979 | C |
| 578 | G | NA | NA | C |
| 578 | Y | -4.7477759 | -5.6949842 | C |
| 578 | W | -2.2799074 | -1.9766006 | C |
| 578 | F | -5.8688034 | -5.6481755 | C |
| 578 | V | -2.0448772 | -4.9806819 | C |
| 578 | L | -4.2710684 | -4.4423021 | C |
| 578 | I | -5.1851741 | -5.2851494 | C |
| 578 | A | -3.4267051 | -4.6214137 | C |
| 578 | T | -5.0370794 | -6.1043482 | C |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 611 | S | -2.817052 | -4.807948 | C |
| 611 | Q | -4.451732 | -3.810131 | C |
| 611 | N | -5.876337 | -5.665975 | C |
| 611 | M | -5.519603 | -5.033895 | C |
| 611 | C | -4.839482 | -5.190619 | C |
| 611 | E | -2.839048 | -2.829627 | C |
| 611 | D | -4.436985 | -4.531521 | C |
| 611 | R | -3.899632 | -4.676647 | C |
| 611 | K | -6.488425 | -4.928408 | C |
| 611 | H | -1.862492 | -1.780014 | C |
| 612 | P | -1.406304 | -5.566331 | C |
| 612 | G | -3.988101 | -4.980311 | C |
| 612 | Y | -5.449361 | -4.988697 | C |
| 612 | W | -5.780988 | -5.192725 | C |
| 612 | F | -1.967298 | -5.440339 | C |
| 612 | V | -4.351154 | -2.140658 | C |
| 612 | L | NA | NA | C |
| 612 | I | -3.771855 | -3.177620 | C |
| 612 | A | -3.521517 | -5.709135 | C |
| 612 | T | -3.236892 | -5.464346 | C |
| 612 | S | -2.072730 | -1.888444 | C |
| 612 | Q | -5.722649 | -5.540977 | C |
| 612 | N | -4.336194 | -4.958569 | C |
| 612 | M | -5.527930 | -5.020075 | C |
| 612 | C | -5.183790 | -5.097871 | C |
| 612 | E | -5.746796 | -5.615152 | C |
| 612 | D | -5.909934 | -5.448383 | C |
| 612 | R | -3.061978 | -4.771869 | C |
| 612 | K | -5.571930 | -5.465480 | C |
| 612 | H | -2.179719 | -5.038504 | C |
`geom_smooth()` using formula = 'y ~ x' Warning message: “Removed 36 rows containing non-finite values (`stat_smooth()`).” Warning message: “Removed 36 rows containing missing values (`geom_point()`).” `geom_smooth()` using formula = 'y ~ x' Warning message: “Removed 36 rows containing non-finite values (`stat_smooth()`).” Warning message: “Removed 36 rows containing missing values (`geom_point()`).”
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Capsid_DMS_wt_nonwt)
Residuals:
Min 1Q Median 3Q Max
-3.2593 -0.8063 0.0975 0.6572 4.3371
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.96786 0.06285 15.40 <2e-16 ***
score_nonwt 0.96218 0.01809 53.18 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.136 on 682 degrees of freedom
(36 observations deleted due to missingness)
Multiple R-squared: 0.8057, Adjusted R-squared: 0.8054
F-statistic: 2828 on 1 and 682 DF, p-value: < 2.2e-16
In [134]:
#Dataframe for pooled replication DMS
Competition_Pool_Enrich2_Scores_Replicationproteins_DMS <- filter(Competition_Pool_Enrich2_Scores,position>862)
Competition_Pool_Enrich2_Scores_Replicationproteins_DMS <- filter(Competition_Pool_Enrich2_Scores_Replicationproteins_DMS,Variant %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
Competition_Pool_Enrich2_Scores_Replicationproteins_DMS <- Competition_Pool_Enrich2_Scores_Replicationproteins_DMS %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Replicationproteins_DMS)[3])
Competition_Pool_Enrich2_Scores_Replicationproteins_DMS
#Dataframe for replication DMS non-normalized experiment
Replication_DMS_N_terminus_2A <- filter(merged_df_indel_DMS,position>862 & position<911)
Replication_DMS_N_terminus_2A <- filter(Replication_DMS_N_terminus_2A,AminoAcid %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
Replication_DMS_N_terminus_2A <- Replication_DMS_N_terminus_2A %>% rename(Variant = colnames(Replication_DMS_N_terminus_2A)[2])
Replication_DMS_N_terminus_2A <- Replication_DMS_N_terminus_2A %>% rename(score_nonwt = colnames(Replication_DMS_N_terminus_2A)[3])
Replication_DMS_N_terminus_2A
Merged_Replication_DMS_wt_nonwt <- merge(Competition_Pool_Enrich2_Scores_Replicationproteins_DMS,Replication_DMS_N_terminus_2A,all.x=TRUE,all.y=TRUE)
Merged_Replication_DMS_wt_nonwt
#Scatter plot replication DMS wildtype vs non wildtype normalized
Scatter_plot_ReplicationDMS_wt_nonwt <- ggplot(Merged_Replication_DMS_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Enrich2 Score", y = "Enrich2 Scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_ReplicationDMS_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_ReplicationDMS_wt_nonwt,filename = "Scatter_plot_ReplicationDMS_wt_nonwt.pdf",device = pdf,width = 2,height=2)
model_Replication_DMS <- lm(`score_wt`~ `score_nonwt`, data = Merged_Replication_DMS_wt_nonwt)
#Linear model for wildtype vs nonwildtype experiment
summary_model_Replication_DMS <- summary(model_Replication_DMS)
summary_model_Replication_DMS
| position | Variant | score_wt |
|---|---|---|
| <dbl> | <fct> | <dbl> |
| 863 | A | -2.63880062 |
| 863 | C | -6.55351361 |
| 863 | D | -4.83644883 |
| 863 | E | -1.12910875 |
| 863 | F | -6.00888197 |
| 863 | H | -5.41506872 |
| 863 | I | -5.19235341 |
| 863 | K | -4.49803959 |
| 863 | L | -4.31494999 |
| 863 | M | -6.12968562 |
| 863 | N | -3.83184087 |
| 863 | P | -4.45111377 |
| 863 | Q | -4.27688280 |
| 863 | R | -1.01681740 |
| 863 | S | -2.77112089 |
| 863 | T | -5.63208995 |
| 863 | V | -3.21275793 |
| 863 | W | -3.53544384 |
| 863 | Y | -5.54266438 |
| 864 | A | 1.13692234 |
| 864 | C | 1.16364994 |
| 864 | D | 0.88141774 |
| 864 | E | 0.81518223 |
| 864 | F | -0.40552718 |
| 864 | G | 0.67202014 |
| 864 | H | 1.10202831 |
| 864 | I | 0.82939297 |
| 864 | L | 1.18361369 |
| 864 | M | 1.51883201 |
| 864 | N | -0.07915795 |
| ⋮ | ⋮ | ⋮ |
| 909 | L | -1.0403038 |
| 909 | M | 1.1010109 |
| 909 | N | 0.9882308 |
| 909 | P | -2.4580308 |
| 909 | Q | 1.2878565 |
| 909 | R | 1.1091144 |
| 909 | S | 0.2636672 |
| 909 | T | 0.8568695 |
| 909 | V | 0.2414721 |
| 909 | W | 1.0353956 |
| 909 | Y | 1.1249580 |
| 910 | A | 0.9576387 |
| 910 | C | 0.3175552 |
| 910 | D | 1.1818012 |
| 910 | E | 1.4313338 |
| 910 | F | 1.3379674 |
| 910 | G | 0.5113716 |
| 910 | H | 0.9969034 |
| 910 | I | -4.6552593 |
| 910 | K | -1.4978718 |
| 910 | L | -0.6404986 |
| 910 | M | 0.9430934 |
| 910 | N | 0.6567146 |
| 910 | P | -3.4641825 |
| 910 | R | -1.2790269 |
| 910 | S | -0.4744694 |
| 910 | T | -3.2459876 |
| 910 | V | -4.7139707 |
| 910 | W | 1.4848147 |
| 910 | Y | 1.5962951 |
| position | Variant | score_nonwt | sub |
|---|---|---|---|
| <dbl> | <fct> | <dbl> | <chr> |
| 863 | A | -0.6993640 | D |
| 863 | C | -4.6222291 | D |
| 863 | D | -2.7010239 | D |
| 863 | E | -0.3387277 | D |
| 863 | F | -2.7118695 | D |
| 863 | G | NA | D |
| 863 | H | -4.0723297 | D |
| 863 | I | -4.3739400 | D |
| 863 | K | -2.7132862 | D |
| 863 | L | -4.9859243 | D |
| 863 | M | -3.3827474 | D |
| 863 | N | -2.3354482 | D |
| 863 | P | -4.0953320 | D |
| 863 | Q | -3.9074147 | D |
| 863 | R | -4.1777451 | D |
| 863 | S | -3.6059825 | D |
| 863 | T | -3.3654838 | D |
| 863 | V | -3.7654225 | D |
| 863 | W | -1.2054866 | D |
| 863 | Y | -4.8484681 | D |
| 864 | A | 2.0226372 | D |
| 864 | C | 1.7417477 | D |
| 864 | D | 1.5003729 | D |
| 864 | E | 1.6555950 | D |
| 864 | F | -0.3130177 | D |
| 864 | G | 1.3599511 | D |
| 864 | H | 1.9457185 | D |
| 864 | I | 1.7591695 | D |
| 864 | K | NA | D |
| 864 | L | 1.9208852 | D |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 909 | M | 2.1335153 | D |
| 909 | N | 1.9037633 | D |
| 909 | P | -4.2713596 | D |
| 909 | Q | 2.1386790 | D |
| 909 | R | 2.0586316 | D |
| 909 | S | 2.5828672 | D |
| 909 | T | 1.9927668 | D |
| 909 | V | 1.4718492 | D |
| 909 | W | 1.6052248 | D |
| 909 | Y | 1.6508510 | D |
| 910 | A | 1.8229694 | D |
| 910 | C | 1.3766078 | D |
| 910 | D | 2.0644396 | D |
| 910 | E | 1.6048646 | D |
| 910 | F | 1.9512124 | D |
| 910 | G | 1.2942231 | D |
| 910 | H | 1.6815032 | D |
| 910 | I | -3.8056666 | D |
| 910 | K | -0.2243074 | D |
| 910 | L | -0.2942483 | D |
| 910 | M | 1.5960049 | D |
| 910 | N | 1.4869589 | D |
| 910 | P | -4.0244476 | D |
| 910 | Q | NA | D |
| 910 | R | -3.6907624 | D |
| 910 | S | 1.1593934 | D |
| 910 | T | -2.5391106 | D |
| 910 | V | -4.1553262 | D |
| 910 | W | 2.1430312 | D |
| 910 | Y | 2.6162566 | D |
| position | Variant | score_wt | score_nonwt | sub |
|---|---|---|---|---|
| <dbl> | <fct> | <dbl> | <dbl> | <chr> |
| 863 | P | -4.4511138 | -4.0953320 | D |
| 863 | G | NA | NA | D |
| 863 | Y | -5.5426644 | -4.8484681 | D |
| 863 | W | -3.5354438 | -1.2054866 | D |
| 863 | F | -6.0088820 | -2.7118695 | D |
| 863 | V | -3.2127579 | -3.7654225 | D |
| 863 | L | -4.3149500 | -4.9859243 | D |
| 863 | I | -5.1923534 | -4.3739400 | D |
| 863 | A | -2.6388006 | -0.6993640 | D |
| 863 | T | -5.6320899 | -3.3654838 | D |
| 863 | S | -2.7711209 | -3.6059825 | D |
| 863 | Q | -4.2768828 | -3.9074147 | D |
| 863 | N | -3.8318409 | -2.3354482 | D |
| 863 | M | -6.1296856 | -3.3827474 | D |
| 863 | C | -6.5535136 | -4.6222291 | D |
| 863 | E | -1.1291087 | -0.3387277 | D |
| 863 | D | -4.8364488 | -2.7010239 | D |
| 863 | R | -1.0168174 | -4.1777451 | D |
| 863 | K | -4.4980396 | -2.7132862 | D |
| 863 | H | -5.4150687 | -4.0723297 | D |
| 864 | P | 0.8823261 | 1.8559857 | D |
| 864 | G | 0.6720201 | 1.3599511 | D |
| 864 | Y | -0.1301695 | -0.5660657 | D |
| 864 | W | 0.5788173 | 0.7805062 | D |
| 864 | F | -0.4055272 | -0.3130177 | D |
| 864 | V | 1.4461590 | 2.2089784 | D |
| 864 | L | 1.1836137 | 1.9208852 | D |
| 864 | I | 0.8293930 | 1.7591695 | D |
| 864 | A | 1.1369223 | 2.0226372 | D |
| 864 | T | 1.4475619 | 2.3093373 | D |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 909 | S | 0.2636672 | 2.5828672 | D |
| 909 | Q | 1.2878565 | 2.1386790 | D |
| 909 | N | 0.9882308 | 1.9037633 | D |
| 909 | M | 1.1010109 | 2.1335153 | D |
| 909 | C | 0.2593156 | 2.1600574 | D |
| 909 | E | 1.3383096 | 1.5110635 | D |
| 909 | D | -1.5720789 | -1.3073061 | D |
| 909 | R | 1.1091144 | 2.0586316 | D |
| 909 | K | 1.5141629 | 2.2983104 | D |
| 909 | H | 1.2846214 | 2.1933719 | D |
| 910 | P | -3.4641825 | -4.0244476 | D |
| 910 | G | 0.5113716 | 1.2942231 | D |
| 910 | Y | 1.5962951 | 2.6162566 | D |
| 910 | W | 1.4848147 | 2.1430312 | D |
| 910 | F | 1.3379674 | 1.9512124 | D |
| 910 | V | -4.7139707 | -4.1553262 | D |
| 910 | L | -0.6404986 | -0.2942483 | D |
| 910 | I | -4.6552593 | -3.8056666 | D |
| 910 | A | 0.9576387 | 1.8229694 | D |
| 910 | T | -3.2459876 | -2.5391106 | D |
| 910 | S | -0.4744694 | 1.1593934 | D |
| 910 | Q | NA | NA | D |
| 910 | N | 0.6567146 | 1.4869589 | D |
| 910 | M | 0.9430934 | 1.5960049 | D |
| 910 | C | 0.3175552 | 1.3766078 | D |
| 910 | E | 1.4313338 | 1.6048646 | D |
| 910 | D | 1.1818012 | 2.0644396 | D |
| 910 | R | -1.2790269 | -3.6907624 | D |
| 910 | K | -1.4978718 | -0.2243074 | D |
| 910 | H | 0.9969034 | 1.6815032 | D |
`geom_smooth()` using formula = 'y ~ x' Warning message: “Removed 48 rows containing non-finite values (`stat_smooth()`).” Warning message: “Removed 48 rows containing missing values (`geom_point()`).” `geom_smooth()` using formula = 'y ~ x' Warning message: “Removed 48 rows containing non-finite values (`stat_smooth()`).” Warning message: “Removed 48 rows containing missing values (`geom_point()`).”
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Replication_DMS_wt_nonwt)
Residuals:
Min 1Q Median 3Q Max
-3.3071 -0.5497 0.0199 0.4073 4.0668
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.00067 0.03475 -28.79 <2e-16 ***
score_nonwt 0.92497 0.01518 60.91 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.01 on 910 degrees of freedom
(48 observations deleted due to missingness)
Multiple R-squared: 0.8031, Adjusted R-squared: 0.8028
F-statistic: 3711 on 1 and 910 DF, p-value: < 2.2e-16
In [ ]: